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I.  INTRODUCTION 

In  optimal  deterministic  control  theory,  the  basic  assumption  is 
made  that  the  effect  of  any  future  control  action  can  dc  deduced  exactly 
from  the  present  state  and  the  dynamical  equation.  In  many  situations, 
the  necessity  for  control  arises  from  the  fact  that  there  are  disturbances 
and/or  component  failures  in  the  physical  system.  These  random  phenomena 
prevent  exact  determination  of  the  effect  of  all  future  actions,  and  there¬ 
fore  deterministic  theory  is  not  strictly  applicable.  If  the  effect  of 
these  random  phenomena  is  small,  one  can  still  use  optimal  control  theory 
to  obtain  a  feedback  control  law  based  on  deterministic  considerations. 

The  feedback  nature  of  the  control  would  tend  to  reduce  the  sensitivity 
to  uncertainties  but  would  require  the  state  of  the  system  to  be  measured 
exactly.  Again,  this  assumption  is  good  only  when  the  measurement  error 
is  small  in  comparison  with  the  signal  being  measured. 

In  many  cases,  the  phenomena  of  uncertainty  (including  measurement 
error)  can  be  appropriately  modelled  as  stochastic  processes,  allowing 
them  to  he  considered  via  stochastic  optimal  control  theory.  Using  the 
Principle  of  Optimality  one  can  reduce  the  stochastic  optimal  control  to 
that  of  solving  a  stochastic  Dynamic  Programming  equation  , 

Unfortunately  this  equation  cannot  be  solved  numerically  in  most  situations. 
In  this  report,  a  new  approach  toward  a  practical  solution  for  stochastic 
control  problems  is  described.  This  report  represents  Part  I  of  a  one-year 
study  supported  by  the  Air  Force  Office  of  Scientific  Research  (AFOSR  Pro¬ 
ject  No.  F44620-71-C-0C77) :  Development  of  Dual  Control  and  Identification 
Methods  for  Avionic  Systems.  Part  II  of  the  study:  Input  Design  for 
Identification,  is  discussed  in  a  separate  report  . 


1.1  Main  Purposes 


Ji  1960,  Feldbaum,  in  a  series  of  three  papers,  introduced  dual  control 
f  F2  1 

theory  1  J .  His  approach  is  a  combination  of  a-'stietical  decision  theory 
and  dynamic  programming.  He  pointed  out  abstractly  that  the  control  signal 
has  two  purposes  that  might  be  conflicting:  ore  is  to  learn  about  any  un¬ 
known  parameters  and/or  the  state  of  the  system'  the  other  is  to  achieve  the 
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control  objective.  Thus  the  best  control  must  have  the  characteristic 
of  appropriately  distributing  its  energy  for  learning  and  control  pur¬ 
poses.  However,  no  further  development  or  algorithms  that  implement 
these  ideas  appear  in  the  literature.  Feldbaum  used  a  static  example 
to  demonstrate  his  dual  control  theory,  but  it  is  difficult  to  visualize 
how  a  dual  control  will  work  in  a  dynamic  situation.  One  of  the  main 
purposes  of  this  study  is  to  provide  a  deeper  understanding  of  duel 
control  theory  for  dynamical  systems.  Another  objective  is  to  develop 
an  approach  toward  obtaining  (or  approximating)  a  near-optimal  dual  con¬ 
trol  that  can  be  implemented,  with  the  objective  of  indicating  the  pot¬ 
ential  applications  of  the  results  to  Air  Force  problems. 

1.2  Outline  of  the  Report 

In  Section  II,  optimal  stochastic  control  theory  is  reviewed  and 
the  practical  difficulties  in  computing  and  realizing  the  optimal  control 
law  are  pointed  out,  both  serving  as  a  motivation  for  the  development  of 
the  later  sec 

In  Section  III,  the  stochastic  control  problem  is  reformulated  in 
light  of  the  dual  navure  of  the  control  and  a  one-step  optimal  dual 
control  strategy  which  possesses  an  active  learning  characteristic  is 
obtained.  This  result  is  new,  and  in  fact  is  entirely  different  from 
the  other  suboptimal  approaches  reported  in  the  literature. 

In  Section  IV,  the  results  are  specialized  to  a  very  important 
class  of  problems  of  controlling  a  time-varying  linear  system  with  ran¬ 
dom  parameters,  and  a  specific  algorithm  is  developed  for  this  class  of 
problems.  Since  the  derived  algorithm  is  rather  complicated,  illustra¬ 
tive  examples  are  presented  to  provide  understanding  of  the  dual  nature 
of  the  resulting  control  strategy, 

In  Section  V,  three  example  problems,  described  in  detail,  are  in¬ 
tended  to  demonstrate  (1)  the  computational  feasibility  of  the  new 
algorithm,  (2)  the  performance  level  of  the  new  algorithm,  and  (3)  to 
provide  more  insight  into  the  dual  control  theory. 
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In  Section  VI ,  potential  applications  of  the  results  obtained 
during  this  research  are  indicated,  and  recommendations  are  made  for 
)  areas  for  future  research. 


1.3  Summary  of  Contributions 

A  new  formulation  and  a  new  stochastic  control  algorithm  for  general 
nonlinear  stochastic  systems  has  been  developed.  The  algorithm  possesses 
an  active  learning  characteristic  that  is  lacking  in  the  existing  aub- 
optimal  stochastic  control  algorithms  described  in  the  literature.  Sim¬ 
ulation  studies  demonstrate  that  this  algorithm  is  potentially  feasible 
for  large  classes  of  Air  Force  problems.  Sizable  improvement  over  the 
widely  used  certainty  equivalence  suboptimal  control  policy  is  demonstrated  in 
the  examples  being  considered.  The  important  class  of  problems  of  con¬ 
trolling  a  linear  time-varying  system  with  random  parameters  is  treated  in 
detail,  and  a  specific  algorithm  for  thi9  class  of  problems  is  obtained.  Sim¬ 
ulation  studies  on  some  example  problems  provide  certain  insights  into  the 
dual  nature  of  the  control.  Also,  these  examples  represent  the  only  com¬ 
plete  simulation  studies  cn  dual  control  in  the  literature. 

1.4  Notations 


Throughout  the  report,  lower  case  underscored  letters  stand  for  vectors 
(e.g.,  x,  £);  upper  case  underscored  letters  stand  for  matrices  (e.g..  A,  JJ). 
Noise  disturbances  are  denoted  by  lower  case  underscored  Greek  letters  (e.g.,  £, 
n). 
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The  transpose  of  a  matrix  A  is  denoted  by  A  . 
column  vector,  x,  is  a  row  vector  and  is  denoted  by 


The  transpose  of  a 
» 

x  . 


Let  A  be  an  nxn  square  matrix;  the  trace  of  A  is  defined  as 

n 

tr  a  •  2  “u  •  (1.1) 

i**l 
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Using  the  convention  that  a  vector  is  always  in  column  form  one  has  the 
gradient  operator 


'a/se. ' 

V,  -  •  ; 

-  3)39. 


W*! 


-  |3/3x. 


(1.2) 


The  gradient  of  the  scalar  function  H(x,0)t  a  column  vector,  ia- written  as 


\-y>  V  'J 


(1.3) 


The  Jacobian  of  the  m-vector  f  is  the  matrix 


,  A  ~  A 

f  x  - —  a 

~x  8x 


3I  A  8X1 


ill  ....  ill 


ilm  ....  ifm 

3x,  3x 
1  n 


-  <y>' 


(1.4) 


Accordingly, 


[yiyr 


[?e  H) 


-  V  V'  H  *  H' 
x  0  8x 


(1.5) 


K  »  V  7*  H 

XX  XX 


(1.6) 


The  natural  base  in  Rn  is  denoted  by  {e^}n  where 

i=l 


e.  =»  1 
""“*1  • 


,th 

i  component. 


(1.7) 


0 
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II.  OPTIMAL  STOCHASTIC  CONTROL 

In  this  section,  the  formulation  and  solution  for  the  optimal  stoch¬ 
astic  control  problem  for  discrete  time  systems  is  discussed,  as  are  the 
difficulties  associated  with  the  solution  procedures.  These  difficulties 
motivate  the  specific  dual  control  approach  presented  in  Section  III. 

2.1  Problem  Statement 

Consider  a  discrete- time  nonlinear  stochastic  system  described  by 
x(k+l)  51  f[[k,x(k),u(k)]  +  l(k)  » 


£(k)  »  h[k,x(k)J  +  r^(k)  ,  k  *  0,l,...yN-l 

where  jt(k)  e  Rn,  u(k)  e  Rr,  and  ^(k)  e  Rm.  It  is  assumed  that  x(0), 
{i(k),'j?(k+D}^"J  are  independent  Gaussian  vectors  with  statistics: 


(2.1) 


e{x(0)}  a  x  (0|0)  ;  Cov{x(0))  «  j£(o|o) 


(2.2) 


EU(k)}  ■  0  ;  Cov{l(k)>  »  £(k) 


(2.3) 


Efn(k+1)}  »  0  ;  Cov(T?_(k+l)  }  -  R(k+1) 


(2.4) 


Consider  further  the  performance  measure 


J  »  E{i/»[x(N)  ]  +  l  /^[^(k)  ,u(k)  ,k] } 
k-0 


(2.5) 


where  the  expectation  E{*}  is  taken  over  all  underlying  random  quantities. 
Finally,  consider  admissible  controls  of  the  feedback  type: 


u(k)  K  ,u(k 


,Yk.uk-l>;  Yk  -  (id) . iCWH  -k"1  ■  W°> . '2(k-l)}  {’-6) 


The  goal  is  to  find  the  optimal  control  sequence  {u*(k)}?”i  that  is  of  the  form 

—  KaU 


5 


(2.6)  and  minimizes  the  cost  (2.5)  subject  to  the  dynamic  constraint  (2.1). 


2.2  Optimal  Stochastic  Control  Solution  Method 


To  solve  the  optimal  control  problem  stated  in  Section  2.1,  Bayes* 
rule  and  dynamic  programming  are  used.  A  complete  derivation  for  the 
optimal  solution  is  given  by  Meier  ;  therefore,  we  shall  only  outline 
the  derivation  and  summarize  the  results  below. 

An  important  concept  is  the  information  state.  This  can  be  viewed 
as  a  quantity  which  is  equivalentt  to  the  observation  process  Yk  and  all 
a  priori  knowledge  of  the  system  and  Uk“^  in  describing  the  future  evolu¬ 
tion  of  the  system.  Thus,  an  information  state  will  summarize  all  the 

v 

sufficient  information  content  conveyed  by  the  observation  process  Y  ,  and 
past  control  sequence  Uk“^.  Clearly,  the  combined  sequence  (Yk,Uk-^)  is  an 
information  state.  If  we  denote  this  information  state  by 


-  (Yk,  Uk-1) 


(2.7) 


then  a  recursion  relation  for.^  is 


^k+l^k’  ^<k)]  "  jk^+MIk.xOO.  u(k)]4[(k)  +  n(k+i)},yk,u(k) 


(2.8) 


where  |(k),  ^(k+1)  and  x(k)  are  random  vectors.  Ancdier  such  Information 
state  is  the  conditional  density *  p[x(k)|Yk,  Uk"^].  Using  Bayes*  rule, 
a  recursive  equation  for  the  conditional  density  is  given  by  *  fMl] 

^k+l[^k»  -(k)]  “  fp[^(k+1)  l£<k+1>  ]/p[s(k+l)  |x(k) ,  u(k)  ]#?  dx(k)  (2.9) 


where  C^  is  a  normalising  constant.  Next,  we  can  use  the  principle 
of  optimality  in  the  "information  state"  space,  which  gives  us 


t  A  precise  definition  of  equivalent  statistics  is  gxven  by  Streibel 
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the  stochastic  dynamic  programming  equation  (see  also  Meier  ^M^)j 

1  I*{^\  j,k}  -  min  E  U(k),u(k),k] 

u(k)  ( 

+  I*t^+1[^.u(k)] ,  k+l}|Yk,  Uk‘M  (2.10) 

f 

where  u(k)  is  a  deterministic  quantity,#^  is  an  information  state  (can  be 
either^  or^k),  and  I*{*,k}  denotes  the  optimal  cost-to-go  associated 
with  the  information  state  at  time  k.  If  we  use^J  as  an  information  state, 
l  then  the  optimal  control  can  be  obtained  by  solving  (2.8)  where  an  optimal 

feedback  table,  {u*(Yk,uk“l)  is  constructed  for  all  possible  pairs 
(Yk,U^_^)t  k“l,  1,  ...,  N-l.  On  the  other  hand,  if  we  use^j[  aa  an  inform¬ 
ation  state,  then  the  optimal  control  can  be  solved  by  the  following  separate 
f  procedures : 

A.  Control  -  The  optimum  control  law  is  found  as  a  function  of 
the  conditional  density  p[x(k) | Y^.U^-1]  by  solving  the  sto- 

t>  chastic  dynamic  programming  equation  (2.10).  In  general, 

this  can  be  an  off-line  procedure. 

B.  Estimation  -  The  conditional  density  is  updated  by  use  of  the 

recursion  relation  (2.9),  and  the  optimum  input  is  obtained 

m 

from  the  optimum  control  law.  The  updating  of  the  conditional 
density  must  be  done  in  real  time. 

2.3  Difficulties  Associated  with  the  Optimal  Solution  Procedure 

Theoretically,  the  optimal  control  problem  has  been  solved  when  equations 

(2.9)  and  (2.10)  are  derived;  however,  in  practice,  the  problem  only  begins 

with  these  equations.  In  the  following,  we  discuss  the  difficulties  associat- 

1  2 

I  ed  with  the  solution  procedures  using  either.^  or  as  the  information 

state.  This  will  motivate  our  development  in  the  next  section. 

From  (2.7),  we  note  that  the  dimension  of grows  linearly  in  k.  Thus 
&  even  with  appropriate  quantizing,  the  number  of  quantization  points,  which 
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grows  in  time  will  soon  become  too  large  to  be  handled  by  a  computer  of  any 
size.  Note  that  the  expectation,  in  (2.10)  requires  the  availability  of  the 

i  k  k-1 

conditional  density,  p[x(k)|Y  ,  U  ],  which  is  usually  infinite  dimensional. 
This  adds  one  more  "dimension"  of  difficulty  in  carrying  out  the  dynamic  pro¬ 
cedure.  In  general,  the  optimal  cost-to-go-function,  I*[.,.],  cannot  be 
expressed  as  an  analytical  function  of  the  information  state.  Thus,  direct 

solution  of  (2.10)  becomes  practically  impossible  for  any  computer.  We  face 

2 

the  similar  kind  of  difficulty  even  if  we  use  3  as  the  information  state. 

2 

In  this  case  the  information  state,  3^t  is  usually  of  infinite  dimension  for 
all  k>l.  One  may  attempt  to  approximate  the  solution  for  (2.10).  Kowever, 
even  if  this  can  be  done,  it  still  does  not  solve  the  dimensionality  problem, 
since  in  general,  the  approximate  optimal  control  law  is  nonlinear  in  the  in¬ 
formation  state,  and  can  only  be  expressed  as  a  table  look-up  type  of  function 
of  the  information  state.  This  prohibits  functional  realization  of  the  optimal 
control  law,  and  thus  real-time  generation  of  the  optimum  control  value  is 
practically  impossible  for  most  problems. 


II 


Note  that  the  basic  difficulty  is  in  the  control  rather  than  in  the  estimation 
procedure.  The  updating  of  density  although  a  difficult  problem  in  itself, 
can  be  reasonably  approximated  efficiently  by  using  parallel  estimation  pro¬ 
cedures,  Some  recent  results  [ ]  ,  [ T3  ]  ,  [  A3  ]  ,  [LI  ]  in{jicate  the  feasibility 
of  parallel  estimation.  We  should  emphasize  the  fact  that  the  capability  of 
approximating  the  conditional  density  does  not  solve  half  the  problem  because 
the  difficulty  in  obtaining  the  optimal  control  in  real  time  is  not  so  much 
due  to  the  estimation  procedure  as  to  the  growth  in  dimensionality  and  to 
the  fact  that  even  if  an  optimal  control  law  is  obtained,  the  extremely  large 
(perhaps  infinite)  number  of  possible  information  states  will  prevent  it  from 
being  realizable. 

In  the  special  case  where  the  system  (2.1)  is  linear,  the  conditional 
density  p[x(k)  is  equivalent  to  the  conditional  mean  estimate  x(k|k) 

(see  Strelbel  ts*^tMeier  ^^^Tse  ^T^),  which  is  a  finite  dimensional  vec¬ 
tor  generated  by  the  Kalman  filter.  If  in  addition,  the  cost  is  quadratic, 
then  the  optimal  cost-to-go  I*[,,k]  can  be  expressed  analytically  as  a  func¬ 
tion  of  x(k|k),  so  that  equation  (2.10)  can  be  solved  exactly  to  yield  a 
realizable  linear  feedback  law  (Joseph  and  Tou^3-^  Meier,  Larson  and  Tether 

[M2]  streibel^s3^  Tse^’^).  This  result  is  known  as  the  Separation  Theorem 

»  * 

or  Certainty  Equivalence  Principle. 
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In  the  literature,  the  most  popular  approximation  method  used  for 
combined  estimation  and  control  is  linearization  of  the  plant  about  the 
deterministic  optimal  trajectory  and  application  of  the  well-known  sepa¬ 
ration  theorem  to  the  resulting  perturbation  equations.  However,  this 
may  not  g-ve  good  performance  if  the  system  is  very  nonlinear  and  the 
noise  level  i3  high.  This  is  because  with  the  linearization  approach 
the  control  action  is  corrected  only  after  it  has  been  discovered  that  the 
trajectory  has  deviated  from  the  nominal.  But,  in  fact,  if  it  is  known 
that  a  disturbance  will  occur  in  the  future,  the  control  should  be  modified 
before  as  well  as  after  the  disturbance  occurs  in  order  to  minimize  its 
effects.  Therefore,  if  linearization  is  to  be  used,  some  nominal  traject¬ 
ory  other  than  the  deterministic  optimal  trajectory  should  be  used.  Den¬ 
ham  Meier  ,  and  Vander  Stoep^^  considered  the  problem  of  choosing 

a  nominal  path  to  minimize  a  certain  cost  criteria  on  using  second  order 
analysis  of  the  perturbed  system  along  the  nominal  path.  The  advantage  of 
these  approaches  is  the  simplicity  of  the  resulting  control  law.  The  main 
drawback  is  the  validity  of  assuming  a  nominal  trajectory.  This  assumption 
is  unjustified  if  there  are  uncontrollable  unknown  parameters  in  the  system. 

A  much  more  "adaptive"  type  of  controller  would  be  desirable. 


The  open-loop  feedback  optimal  approach  suggested  by  Dreyfus^0^,  and 

rTn  r  b2  i 

applied  to  specific  problems  by  Tse  and  AthansL  J,  Bar-Shalom  and  Sivan1  J, 
fCll  [Al]  r  S2 1 

Curry  Aoki1  and  Spang1  J f  suffers  from  the  drawback  that  the  resulting 
control  is  passive  in  learning  —  the  decision  of  the  control  action  does  not 
anticipate  the  fact  that  future  learning  is  possible.  An  extension  of  this 
approach  —  the  m-measurement  feedback  control  suggested  by  Curry1  J  —  ■Is 
only  slightly  less  complicated  than  the  optimal  approach.  To  the  authors* 
knowledge,  no  successful  application  of  this  method  has  been  reported  in  the 
literature. 


All  these  approaches  take  into  account  the  past  observation  information 
but  ignore  the  future  observation  program.  In  the  next  section,  we  shall 
describe  a  new  method  which  is  based  on  the  Principle  of  Optimality  on  the 
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information  state  3*^  and  the  concept  of  dual  control.  In  contrast  to  the 
previous  approaches,  this  method  will  not  only  take  explicitly  into  account 
the  past  observation  information  but  also  the  future  observation  program. 
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III,  DUAL  CONTROL  FOR  STOCHASTIC  SYSTEMS— 

AN  ACTIVE  LEARNING  PROCEDURE 

In  Section  II,  It  was  noted  that  the  main  difficulties  in  implementing 
the  optimal  control  law  are: 

(1)  The  information  state  is  eijther  infinite  dimentional  or  finite 
but  grows  with  time, 

(2)  The  optimal  cost-to-go  associated  with  the  information  state 
is  generally  a  non-analytic  function, 

(3)  Storage  of  the  control  value  associated  with  each  information 
state  at  time  k,  k«0,***,N-l  is  practically  Impossible  due  to 
the  large  dimensionality. 

Thus  a  reasonable  suboptimal  approach  would  be  to 

(1)  Reduce  the  dimension  of  the  information  state  space  so  that  it 
stays  a  constant  dimension  for  all  time, 

(2)  Approximate  the  optimal  cost-to-go  associated  with  each 
information  state  at  time  k,  k»0,»**,N-l, 

(3)  Compute  the  control  value  on-line  rather  than  obtain  the 
feedback  law  off-line  and  store  the  whole  "feedback  table." 

Each  of  these  procedures  are  discussed  in  detc  .1  in  the  following 
subsections.  To  simplify  the  discussion,  assume  that  the  cost  is  of  the 
form: 

j2>[x(k),  u(k),k]  »  L[x(k),k]  +  <}>[u(k),k] 

The  extension  to  the  more  general  cost  is  straightforward. 
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3.1  Wide-Sense  Adaptive  Control 

As  discussed  In  Section  II,  the  information  state,  p[x(k)|Yk,  Uk"”*] 
is  generally  of  infinite  dimension.  One  approach  to  reduce  this  dimension 
is  to  use  the  "wide-sense"  property ;  in  this  approach  the  controller 
is  restricted  to  the  form 

u(k)  «  u[k,x(kjk),£(kjk)]  (3.1) 

where 

x(k|k)  **  E(x(k)|Yk,  tl^1}  (3.2) 

£(k|k)  -  Cov{x(k)|Yk,  U1^1}  .  (3.3) 

We  shall  call  such  a  control  scheme  the  wide-sense  adaptive  control  law. 

The  computation  of  x(k|k),  £(k|k)  can  be  obtained  by  any  one  of  the 
following  methods; 

(1)  Extended  Kalman  Filter [ 54 

fT5l 

(2)  Adaptive  Filter  with  Tuning1  1 

(3)  Second  Order  Filter 

(4)  Parallel  Estimator  ^  **T3*  *  ^  . 

Depending  on  the  specific  problem  under  investigation,  one  of  these  methods 
may  be  more  appropriate  than  the  others. 

3 . 2  Perturbation  Control  and  the  Dual  Cost 

Before  going  into  the  new  approximation  procedure,  consider  first  the 
perturbation  control  problem  and  obtain  a  cost  that  exhibits  the  dual 
property  of  the  control. 

k-1 

The  present  time  is  indexed  by  k.  Let  us  assume  that  u 

k 

has  been  applied  to  the  system,  and  that  the  observation  sequence  Y  has 
been  obtained.  The  conditional  mean,  x(kjk),  and  covariance,  £(k|k),  are 
assumed  available  from  a  learning  device,  ar.  estimator.  Consider  a  nominal 
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open-loop  control  sequence  U  (k,N-l)  »  {u  (j)}^”?"  and  the  associated  nominal 

O  ~0  J"K 


x  (j+1)  «  f [j,x  (j),u  (j)]  ;  j"k . N-l 

"T)  u  w 


(3.4) 


with  initial  condition  x^k)  *  x(k|k) 

Let  6x(j)  be  a  small  perturbation  about  the  nominal  path  due  to  the  dis¬ 
turbance  |_(j)  and  a  perturbation  control  uu(j) .  The  true  trajectory  and 
control  are  given  by 

x(j)  -  x  (j)  +  6x(j)  ; 


u(j)  *  u  (j)  +  6u(j) 


(3.5) 


with  x(j),  u(j)  satisfying  (2.1).  Since  6x(j),  6u(j)  are  assumed  to  be 
small,  we  can  approximate  the  cost-to-go  by  expanding  it  up  to  second  order: 

N-i  k 

J(k)  -  E{>Hx(N)]  +  l  [L[x(j),j]  +  ♦fe(j),j]J|Y  ) 
j«k 

N-i 

=  J  (k)  +  E{^ '  6x(N)  -b-^x'  (NH  6x(N)  +  \  [l'  (j)dx(j) 

o  o,X“  2-  o,xx  j,k  °>£ 

+7521,a)Lr,xx<j)6^(:))  +  ^,u(j)^(j)+l6-,(j)l,,o,uu  (J)  -(j>]lYkf  J 


(3.6) 


where 


J  (k)  «  *[x  (N)]+  l  L [x  ( j ) ,  j  ]  +  <t> [u  ( j )  » j  1 


(3.7) 


The  quantities  \p  and  are,  respectively,  the  gradient  and  Hessian 

of  if’(')  with  respect  to  x  evaluated  along  the  nominal  trajectory.  For  a 

fixed  nominal,  choosing  <5u  (j),  j»k,...,N-l  to  minimize  the  incremental 

— ’ o 


cost  AJ(k)  *  J(k)~J  (k)  ,  one  obtains  a  cost  J*[k,U  (k,N-l)]  associated  with 


the  nominal  control  Uo(k,N-l). 


Let  us  consider  the  perturbation  control  problem.  From  (3.6),  we  have 


AJ(k)  »  J(k)-J0(k)  -  E  ^q>x6x(N|N) (N|N)<|/o  xx6x(N|N) 


+  Jltt<L®»i<J>^(3  *J>  +^"  <J  *j>L°,xx  -(j  lj)  +  <(,o,u(j)6-(j) 


—  j-k  » — 

where  <5x(j|j)  -  E{6x(j)|Y^}  and  2Q(j|j)  ■  Cov{5x(j) |y^ } .  The  problem  is 
to  minimize  AJ(k),  subject  to  the  dynamic  constraints  of  the  second  order 
incremental  process. 


(3.8) 


Application  of  dynamic  programming  with  retention  of  up  to  second  order 
terms  yields  the  following  (the  derivations  of  (3.9)-(3.17)  can  be  found  in 
Appendix  A) : 


(3.9) 


where 


H0u>  -  ite0(j).ji+*[ii0(j>,j]+E;(j+i)f0u)i(0(j)i£o,^)o),a^(j)]  <3-10> 


j>,o> « H0i2(j)-i£;i!l(i)K0<3«)fo>sa)+H0ia2(j>]' 

lHo,uu«)+fo,u:3)^«+1)4,a«>'"lHo,i1(3>  > 


(3.11) 
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v*r**rv^**  j*  -v»v  ww*HUw*m.»  ■*’*!“'*' **"»« 


V3>  ’  ^,,«>£<,«+1>4,s0)-  ^,u«>£o«+1)4,2cO)+ho,!15«)1 

•  ‘Ho,»a«>+4,»«>40+1>4,u(J’ri 

•  tf;,„(J)K,0+l)f„iO)+ao.uxy>1+Ho,xS<«'  5oW-*o,«x 


(3.12) 


and  the  op'tmal  cost  associated  with  the  nominal  U0(k,N-l)  is  given  by 


J*[k,Uo<M-l>]  -  J0«+g(>(k)+ftrj#0(£iI0Ol|»+  j<»o,xx«>4<Jlj) 


+  [ S  ( J+i  I  i )  ~  1  ( 3+1 1 J+1)  ] K  ( j+i> } 

"O  “O  “v  1 


(3.13) 


with  g0(j)  satisfying 

go0)  -go0«)-^HOjH(j)[Ho>ia(J)+^ja(j)^0+l)f>iH0))-1H0>2(J}  ; 
g0(N)  -  0 


(3.14) 


and  j^(j|j)  is  the  future  error  covariance  which  is  assumed  to  be  generated 
by  the  extended  Kalman  filter: 


l  (j+1 1  j+1)  -  [I-V  (j+l)h  (j+l))E  (J+l|j)  ;  j*k, . . .  ,N-1 

U  “■  U  w  j  A.  U 


i :  (k|k)  -  E(k|k) 


(3.15) 


V  (j+1)  -  7.  (J+l|j)h'  (j+1)  •  [h  (j+1) I  (j+1  j j ) )h*  (j+l)+R(j+l))‘ 

~v)  *'“0  U  )A  V|A  vJ  V|A 


(3.16) 


iod+llJ) -io>s0>VJ  lJ>C,x<J)  +20>  •  <3a7) 

Note  that  the  updated  and  one-step  prediction  error  co-  riances,  (j+1 | j+1) 
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and  I^j+llj),  are  dependent  on  the  choice  of  the  nominal  control  UQ(k,N-l). 
The  cost  J*[k,Uo(k,N~l)]  associated  with  yQ(k,N-l)  involve?! 

•  Control  cost  J  (k) 

ov  ' 

*  Estimation  cost — the  remaining  terms  involving  nonnegative 
weightings  of  error  covariances 

For  this  reason,  J*[k,Uo(k,N-l)]  will  be  called  the  dual  cost  associated  with 
the  nominal  Uo(k,N-l).  We  shall  comment  on  the  existence  of  ( j )  and  j^Cj) 

in  Section  3.4,  item  5. 

3.3  One-Step  Optimal  Dual  Control 

The  outline  of  the  one-step  optimal  dual  control  procedure,  which  is 
the  main  result  of  this  report,  is  as  follows.  It  is  assumed  that  at  the 
present  time  k,  one  can  apply  an  arbitrary  control  u(k).  From  time  k  to 
k+1  a  second  order  extrapolation  is  performed  and  for  j>k+l,  the  future 
time,  only  perturbation  analysis  about  some  nominal  trajectory  is  carried 
out.  By  assuming  that  perturbation  control  will  be  applied  in  addition  to 
a  nominal  from  time  k+1  to  the  end  of  the  process,  one  obtains  the  expression 
of  the  cost  (3.13),  which  includes  the  future  estimation  performance.  Since 
this  performance  depends  on  the  present  control  u(k),  the  method  is  to  choose 
the  control  such  as  to  minimize  (3.13)  which  includes  both  control  performance 
and  estimation  performance.  It  has  to  be  pointed  out  that  the  use  of  the 
(fictitious)  nominal  trajectories  and  perturbations  between  k+1  and  N  is 
with  the  sole  purpose  of  obtaining  the  value  of  the  cost-to-go.  The  pro¬ 
cedure  is  repeated  at  every  step  to  obtain  the  value  of  the  control  to  be 
used  next. 

Let  {x^k+l)}^^  be  a  set  of  points  in  the  state  space  that  are  selected 
on  the  basis  of  past  estimation  performance.  Associated  with  each  x  (k+1) 
is  a  sequence  of  nominal  controls  {u^j)}””^^.  The  nominal  trajectory 
is  obtained  by 


x  (j+1)  «  f.[j,x  (j),  u  (j)]  ,  j«k+l, . . . ,N-1 


(3.18) 
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Next,  consider  s  control  u(k)  to  b<«  applied  at  time  k.  Expanding  the 
function  .f[k,*,u(k)3  about  x(kjk)  up  to  second  order  terms,  we  have  the 
predicted  state  and  covariance  given  by 

,  n  . 

x (k+1 1 k)  ■  £[k,x(kjk),u(k)]  +j  £  e,  tr  (f*  [x(k|k) ,u(k)  JECkjk.}} 

1  i>l  *  --  “  ~ 


i(k+l|k)  -  f  [x(k|k),u(k)]Z(k|k)f'[x(k|k),u(k)]+£(k) 


r  n 


<3.19) 


+  f  I  I  tr{f^x  [x(k|k),u(k)J 


i-1  j-1 


•E(k|k)f^  [x(kjk) ,u(k)]E(kjk)} 


(3.20) 


i.  tli 

where  f  denotes  the  Hessian  of  the  i  component  of  £_  with  respect  to 
x  and  is  the  natural  base  in  R  .  The  updated  error  covariance  for 

the  incremental  state  estimate  is: 


E(k+l|k+l)  «  {I-V(k+l)h  [k+l,x(k+l jk)  ]  }£(k+l  jk) 


V(k+1)  »  E(k+l|k)h  [k+l,x(k+l|k)) ' fh  [k+l,i(k+l|k)]E(k+l |k) 
— *  x  -*x 


t/  [k+1 ,  x  <k+l  |  k)  ]  +  R(k+1 f 


(3.21) 


(3.22) 


If  the  predicted  state,  x.(k+l|k),  caused  by  the  control  u(k)  is  closest  to 


x^k+l),  i.e., 


||  x(k+ljk) -x^Ck+DlU  li£(k+l|k) -^.(k+Dli  ;  v’  »  1 . 1  (3.23) 

then  the  future  analysis  will  be  based  on  perturbation  about  the  vth  nominal 
as  derived  in  the  previous  section.  Note  that  for  all  admissible  u(k),  there 
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corresponds  a  nearest  nominal  such  that  (3.23)  is  satisfied.  The  error 
covariance  Jj^j+llj+l)  i8  given  by  (3.15)-(3.17)  with  ir^-lal  condition 
Z_(k+l|K+l),  where  E_(k+l|k+l)  is  given  by  (3.21)  and  (3.22). 

Since  we  assume  that  for  j>k+l,  only  perturbation  analysis  will  be 
carried  out  along  the  nominal  if  (3.23)  is  satisfied,  the  optimal 
cost-to-go  time  k+1  can  be  written,  on  the  basis  of  the  results  of  the 
previous  subsection,  as  follows  (see  Appendix  A  for  the  derivation) : 

I*[;<k+l|k+l),I(k+l|k+l),k+l]  -  Jv(k+l)+gv(k+l)+itr|*v>!ixEv(N|iO 

+  T(Hu>XJia)|vO|J)  +  [|vO+lU) 

J~K 

-  (j+l  |  j+l)  ]Ky  (j+D  +£^  (k+l)^  (^1  |k+D 
+  i&'  (k+1  |k+l)K  (lt+l)x  (k+1 1 k+1)  (3.24) 

2  — *  V  *“V  ’"■V 

where 

x^k+ljk+l)  =  x  (k+1 1  k+1) -x^  (k+1)  .  (3.25) 

Therefore,  the  cost  of  applying  u(k)  can  be  approximated  as  follows: 

I  i  [u  (k)  ]  =  E{  <j>  [u  (k) , k ]  +  L  [x  (k)  , k ]  +  I*  [£ (k+1 1  k+1) ,  Z  (k+1 1  k+1 ) , k+1  ]  |  Yk } 

=  <Hu(k),k]  +E{L[x<k),kl  |Yk}  +  Jv(k+1)  +gv(k+l)  +ftrjl'v>xxlv<NlN) 

+  xxa)iv(j|j)  +  [iva+ilJ)-iva+i|wi^a+i>! 

j=k  ’ - 

+£.',  (k+l)x^  (k+1 1  k)  +|  x^  (k+1 1 k) (k+1  )x^  (k+1  j  k) 

+|tr{[I,(k+l|k)  -  £  (k+1 1  k+1)  ]  (k+1) ;  (3.26) 

+  If  there  is  more  than  one  v  satisfying  (3.23) ,  we  may  choose  any  one 
of  them. 
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where 


x^(k+l|k)  «  x(k+l | k)  -  xv(k+l)  . 

Since  E{L[x(k),k]  Y^}  is  independent  of  u(k),  minimizing  the  cost  (3.26) 
is  equivalent  to  minimizing 


(3.27) 


Jd[u(k)]  -  Jv(k+1)  +  <f>[u(k)  »k]  +  gv(k+l)  +£^(k+l)  [x(k+l  |k)  -  x  (k+1)] 


+  [x  (k+1 1  k)  -  x  (k+1)  ] '  K  (k+1)  [x  (k+1 1  k)  -  x  (k+1)  ] 


+^tr{(Z(k+-|k)  -E(k+l|k+l)]K  (k+l)+^  z  (N|N) 

^  XX^) 


+V*(k+1>i<k+1lk+1>  +  I  a|j> 

j®k+2  * —  v 


M  -k 

+  l  mUL  CJ+1|J>  -  E  (j+l|j+l)]K  (j+1)} 


j=k+l 


(3.28) 


subject  to  the  constraints  (3.19),  (3.20),  (3.21),  (3.22)  and  (3.23)  and 

4 

where  £y(j+l|j),  I.v(j+llj+l)  are  given  by  (3.17)  and  (3.15),  respectively, 
with  initial  condition 


E..  (k+1 1  k+1)  -  z  (k+1 1  k+1) 


(3.29) 


The  procedure  for  computing  J^tuOO]  is  also  described  in  Fig.  3.1. 


One  can  extend  this  to  the  situation  in  which  a  nominal  control  sequence 
{.^[j  ;i(k+l  |k)  ]  }j_k+1  *s  associated  with  each  predicted  state  x(k+l|k). 

Thus  if  a  control  u(k)  yields  x(k+l|k),  future  analysis  will  be  carried  out 


around  the  nominal  control  { Uq  [  j  ;x(k+l  |k)]  and  the  nominal  trajectory 


x  [j+l;x(k+l|k)]  *  f.{j,x  [j;x(k+l|k)],  u  [j;x(k+l|k)]>  ; 


j  “  k+1,. . . ,N-1 


x  [k+l;x(k+ljk)J  =  x(k+l|k)  . 


(3.30) 


In  this  case,  J^[u(h)]  becomes 
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FIGURE  3.1  COMPUTATION  OF  THE  ONE-STEP  DUAL  COST  J  [_u(k)] 
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Jd(u(k)]  -  Jo(kfl)  +  $[u(k),k]  +  gQ(k+l)  +  ftr j[Z(k+l|k)-  E(kfl|feH) jK^k+1) 

N-l 


+  ^**i(NlN)+HOXx(k+1)i(k+1lk+1)+  X  Ho  xx(j)4(Jlj) 

°,x  x^1  0»*2L  j*k+2  — 

N-l  ) 

+  ^  M^ib)  - ^(j+ilJ+Dij^a+Dj  •  <3-31> 

Depending  on  the  problem  under  consideration*  one  may  or  may  not  want  to 
discretize  the  state  space  for  the  predicted  state. 


Denote  the  optimal  solution  for  the  above  one-step  optimization  problem 
by  u*(k).  When  ti*(k)  is  applied  to  the  system  and  a  new  observation  ^(k+1) 
is  obtained,  the  estimate  of  x(k+l)  and  its  error  covariance  are  updated 
and  the  same  procedure  is  repeated  to  obtain  u*(k+l).  Starting  with  k«0  to 
k=N-l,  we  obtain  a  sequence  of  controls  {u*(k)}^i  which  is  called  the  one- 
step  optimum  dual  control. 

Note  that  in  the  above  development,  the  choice  of  future  nominal  control 
is  "fictitious";  it  is  only  used  to  approximate  the  optimal  cost-to-go  function. 
Therefore,  its  choice  is  quite  flexible  and  is  dependent  upon  the  problem  under 
consideration.  In  Section  IV,  we  indicate  how  these  nominals  can  be  selected 
for  a  special  class  of  problems. 


3.4  Remarks 

1.  Note  that  in  most  esses,  given  in  (3.28)  or  (3.31)  cannot  be 
expressed  explicitly  as  a  function  of  ju (k) ;  therefore,  straight¬ 
forward  minimization  techniques,  such  as  taking  the  derivative  with 
respect  to  u,(k)  and  setting  it  to  zero,  would  be  of  no  use. 

Because  of  the  rather  complicated  dependence  of  on  u,(k) ,  one 
has  to  search  to  find  the  minimizing  u(k)  which  will  be  applied 
to  the  system.  Search  methods  appropriate  for  finding  u(k)  are 
those  of  local  variations  or,  if  the  control  is  a  scalar,  then 
a  line  search,  e.g.,  Fibonacci.  To  obtain  u*(k),  start  the 
search  at  u^e(k),  the  first  of  the  sequence  of  controls  obtained 
by  assuming  certainty  equivalence  (i.e,,  the  separation  theroem) 
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to  be  valid.  Then  determine  which  direction  Jj  decreases,  next 
the  "box"  in  which  the  minimum  lies,  and  then  narrow  it  down  to  a 
certain  predetermined  size,  and  finally  make  a  quadratic  inter¬ 
polation  from  the  last  three  points;  the  result  is  taken  as 
u*(k).  A  search  procedure  is  described  in  Appendix  B. 

2.  The  approach  described  in  this  section  requires  appropriate  selection 
of  nominal  controls,  an  essential  in  approximating  the  future  optimal 
cost-to-go.  Note  that  these  nominals  are  not  applied  in  the  future, 
but  only  to  give  a  rough  idea  of  the  optimum  cost  corresponding  to 
future  learning  and  control.  This  flexibility  is  a  distinguished 
feature  of  our  approach.  One  may  consider  this  to  be  an  advantage 

or  disadvantage,  depending  on  one's  viewpoint.  Clearly,  such  an 
approach  will  not  be  of  use  to  a  designer  who  knows  nothing  about 
the  system  he  is  controlling,  since  he  is  unable  to  select  a  set 
of  appropriate  nominal  controls.  However,  an  engineer  who  is 
familiar  with  the  system  he  is  controlling . can  use  his  heuristic 
knowledge  to  select  the  nominal  controls.  For  him,  this  approach 
is  of  great  value,  because  it  makes  use  of  his  knowledge  to  come  up 
with  a  good  control  strategy  in  a  systematic  manner.  Thus,  in  some 
sense,  the  approach  bears  some  characteristic  of  heuristic  program¬ 
ming  methods, where  use  is  made  of  knowledge  of  the  system  to 
reduce  the  dimensionality  of  the  program. 

3.  Let  us  comment  on  the  dual  nature  of  the  control.  The  estimation 
purpose  of  the  control  is  reflected  by  the  covariances  appearing  in 
(3.20)-(3.22).  If  the  predicted  and  updated  error  covariances  are 
independent  of  the  control,  the  dual  property  will  disappear.  This 
would  be  the  case  if  the  system  is  linear  (with  known  parameters). 

In  general,  this  dual  property  of  the  control  is  important. 

4.  We  shall  also  distinguish  two  different  types  of  learning  procedures. 
Note  that  if  the  function  f  (x,u)  is  not  a  function  of  the  control 

"“X  —  — 

u  (e.g.,  when  f,(x,u)  »  _f  (x)  +  &(u)  and  the  measurements  are  linear 
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then  the  error  covariances  £(k+ljk)  and  £  (k+1 | k+1)  will  be  in¬ 
dependent  of  the  control  action  at  time  k;  (see  (3.20)  and  (3.21)). 
The  control  does  not  influence  the  estimation  performance  in  one 
step,  but  the  effect  of  the  control  in  future  estimation  will  appear 
n  steps  (n>l)  after  the  time  it  is  applied;  (note  the  dependence 
on  the  nominal  in  (3.15)-(3.17)) .  In  this  case,  the  control  has  the 
capability  of  exciting  certain  modes  of  the  system  that  will,  in  the 
future,  enhance  the  estimation.  A  typical  example  is  the  problem  of 
controlling  a  linear  system  with  known  zeroes  but  unknown  poles.  In 
the  second  case,  if  ^(Xju)  is  a  function  of  u,  £hen  the  error 
covariances  I_(k+1 1  k)  and  £(k+l|k+l)  will  both  be  dependent  on  the 
control  action.  Besides  exciting  certain  modes  of  the  system,  the 
control  also  has  the  capability  of  directly  regulating  the  signal- 
to-noise  ratio  and  isolating  the  effects  of  different  parameters. 

A  typical  example  is  the  problem  of  controlling  a  system  where  the 
control  multiplies  the  state  and/or  some  unknown  parameters  of  the 
system.  Thus,  we  see  that  the  control  is  "actively  adaptive"  since 
it  regulates  its  learning  in  an  optimal  manner. 

5.  A  sufficient  condition  for  ( j ) ,  ;;  (j)  and  gQ(j)  ((3.12),  (3.11), 
(3.14))  to  exist  is 


H0,Eu0>  *  0  •  <3-32> 

Let  us  consider  the  deterministic  control  problem  of  minimizing  the 

performance  JQ(k)  given  by  (3.7)  for  the  system  described  by  (3.4). 

The  Hamiltonian  for  this  problem  is  given  by  (3.10);  therefore,  from 

(3. 11) -(3. 14) ,  t*ie  adjoint  variable,  K^(j)  Is  the  return 

matrix  for  the  linear  quadratic  control  problem  whose  state  equation 

is  (3.4)  linearized  about  the  nominal  and  whose  cost  matrices  are  the 

second  derivatives  of  the  Hamiltonian  evaluated  along  the  nominal,  and 

the  quantity  JQ(k)  is  the  deterministic  performance  when  the  initial 

state  is  x  (k)  and  the  nominal  control  is  used.  Thus  condition  (3.32) 

— o 

is  equivalent  to  the  existence  of  neighboring  stationary  paths  about 
the  nominal  trajectory.  in  general,  if  the  deterministic  control 
problem  has  a  solution  and  if  the  nominal  control  trajectory  is  that 
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solution,  then  H  (j)  +  f'  (j)K(j+l)f  (j)  will  be  positive 
semi-definite.  Where  this  deterministic  optimal  control  is  non¬ 
singular,  this  matrix  is  positive  definite  and  thus  invertible; 
and  where  it  is  singular  the  inverse  should  be  replaced  by  the 
pseudo-inverse.  For  a  general  nominal  trajectory  no  such  state¬ 
ments  can  be  made;  however,  for  nominal  trajectories  near  the 
deterministic  optimum,  one  would  expect  similar  properties  to  hold. 

Thus,  one  reasonable  choice  of  the  set  {U  (k+I,N-l)’}  .  would  be 

V  V-JL  ^ 

the  deterministic  optimal  controls  associated  with  {x^Ck+l)}^^  . 

In  the  special  case  where  f_[k,x(k)  ,u(k)]  is  first  order  in  the 
control  and  <j>(u,j)  is  strictly  convex  in  u,  then  H  (j)  -4  (j) 

^  O  »U  U  0)UU 

which  is  positive  definite  by  the  convexity  of  4>  therefore  in  this 
special  case  the  matrix  can  be  inverted. 

6.  The  results  hold  even  when  the  cost  has  the  more  general  form  (2.5). 

The  only  change  one  needs  to  make  is  to  replace  (3.10)  by 

H0(j)  =^[xo(j),uo(j)]+P;(j+l)f[f,xo(j),uo(j)]  (3.10)" 

and  the  term  4[u(k)]  in  (3.28)  by 

EMx(k)  ,u(k)  ,k]  |Yk}  =.2?[x(k|k)  ,u(k)  ,k] 

+  trC2?  [x(k|k),u(k),k]E^(k|k)  .  (3.33) 


24 


I 


I 


» 


» 


I 


f 


I 


t 


9 


* 


IV.  ACTIVELY  ADAPTIVE  CONTROL  FOR  STOCHASTIC  LINEAR 
SYSTEMS  WITH  RANDOM  PARAMETERS  VIA  DUAL  CONTROL 

In  this  section,  we  consider  the  control  of  linear  systems  with 
unknown  parameters,  a  class  of  problem  of  major  theoretical  and  practical 
importance.  A  control  strategy  that  regulates  its  speed  of  learning 
(i.e.,  the  adaptivity  is  not  passive  but  active)  is  obtained  for  this  class 
of  problems  by  specializing  the  results  of  section  III. 

4.1  Problem  Statement 

Consider  a  discrete-time  linear  system  described  by 

x(k+l)  -  A[k,£(k)]x(k)  +bjk,0(k)]u(k)  +£(k) 

X(k)  =  £[k,0(k)  ]x(k)  +_n(k)  k*=0,l,...  (4.1) 

where  x(k)eRn,  ^(k)eRm,  £(k)eRS  and  u(k)  is  a  scalar  control.^  It  is 
assumed  that  £(k)  is  a  Markov  process  satisfying 

_0(k+l)  =  D(k)8(k) +x(k)  k=0,l, . . .  (4.2) 

where  D(k)  is  a  known  matrix.  The  vectors  {x(0),  8_(0) ,  £(k),  n.(k+l), 
j(k) ,  k=0,l,...)  are  assumed  to  be  mutually  independent  Gaussian  random 
variables  with  kr.own  statistical  laws: 

x(0)^[x(0),Exx(0)];  e(Oh^[e/O),E00(O)];  £(kH^0,£(k)] 

n(k)^[0,R(k)]J  x(kK®I0,G(k)]  (4.3) 

with  £xx(0)>£,  £00(O)>£,  R(k)>£,  £(k)i£,  G(k)i£.  The  notation  W$?Ca»B) 
is  used  to  denote  that  the  random  vector  v  is  Gaussian  with  mean  &  and 
covariance  j$.  Furthermore,  we  assume  that  the  unknown  parameter  £(k)  enters 
linearly  in  A(k,‘)>  b(k,*)  and  C(k,»)* 

4* 

For  simplicity,  we  shall  discuss  only  the  scalar  input  case.  The  results 
can  be  readily  extended  to  the  multi-input  case.  See  also  Section  4.5. 
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A  control  is  admissible  if  it  is  non-anticipative;  i.e., 


u(k)  -  u(k,Yk,Uk-1)  ;  Yk-{Z(l),...,2(k)};  Uk_1-{u(l)9...u(k-l)}  (4.4) 

Our  objective  is  to  find  an  admissible  control  sequence  such  that 
the  cost  functional 

i  N-i 

J(U)  -  jE  [x(N)  -£(N)]'W(N)  [x(N)  -£(N)]  +  l  [x(k)  -£.( k)] 
z  “  k-0 

•  W(k)  [x(k)  -£(k)]  +X(k)u2(k)j  (4.5) 

is  minimized  subject  to  the  dynamic  constraints  (4.1)  and  (4.2).  The  expecta¬ 
tion  in  (4.5)  is  over  all  the  underlying  random  quantities  x(0),  £(0), 

{JL(k),  £(k+l),  xOO,  ke0,l,  •  • .  ,N-1).  Assume  the  following; 

1.  W(k)  l  0  and  X(k)  >  0  , 

2.  (£(k),  k”0,l, • • • ,N}  is  given  a  priori. 


Note  that  if  £(k)  *»  (),  k=0,l, * • . ,N,  we  have  a  regulator  problem;  if 
{p.(k)}^-(j  is  a  given  trajectory,  we  have  a  tracking  problem;  and  if 
W(k)  *  (),  k=0, • • • ,N~1  but  W(N)  f  0,  we  have  an  interception  problem. 


Previous  Approaches 


Before  describing  our  new  approach  to  this  class  of  problem  it  is 
appropriate  to  summarize  some  of  the  past  approaches  and  indicate  how  this 
work  fits  into  the  whole  development. 


This  problem  can  be  solved  exactly  if  one  can  solve  the  stochastic 
dynamic  programming  equation  (2.8)  associated  with  the  problem;  unfortunately, 
a  numerical  solution  for  this  is  prohibited  by  the  "curse  of  dimensionality" 
(aee  Section  II).  Thus  different  approaches  have  been  suggested  in  treating 
this  class  of  problem. 
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One  popular  approach  Is  the  certainty  equivalence^^.  If  at  a  time 
instant,  the  estimates  of  the  unknown  parameters  are  available,  a  control 
law  can  be  obtained  by  assuming  the  estimated  parameters  to  be  the  true  ones 
and  solving  the  control  problem  accordingly.  In  this  manner,  we  obtain  a 
control  law  which  is  adaptive  to  the  estimates.  The  problem  now  is  reduced 
to  that  of  closed-loop  parameter  estimation.  Such  an  approach  has  been 
considered  by  Farison,  et  al.,^^  Saridis  and  Lcbbia.^^  The  question 
now  is  not  "how  to  control  the  system,"  but  rather  "how  well  can  we 
estimate  the  parameters."  The  advantage  of  this  approach  is  the  simplicity 
of  the  control  law.  The  major  drawback  to  the  approach  is  that  we  are 
ignoring  the  confidence  level  on  the  parameter  estimates  in  deriving  the 
adaptive  control  scheme;  one  would  expect  that  such  a  control  scheme  will 
result  in  a  control  system  which  is  extremely  sensitive  to  stochastic 
variations,  which  turns  out  to  be  the  case. 

If  the  design  of  adaptive  systems  takes  not  only  the  instantaneous 
parameter  estimates  but  also  the  associated  confidence  levels  into  account, 
it  would  surely  result  in  a  "better"  system.  One  such  method  is  the  open- 
loop  feedback  approach ^ ^ .  Typical  papers  along  this  line  are  those  by 
Bar-Shalom  and  Sivan^B^,  Curry ,  Aoki^^,  Spang ,  and  Tse  and  Athans^  . 
In  the  last-mentioned,  it  was  demonstrated  that  in  the  case  where  only  the 
input  gain  vector  is  unknown,  the  adaptive  feedback  gains  of  the  control 
system  depend  upon  the  parameter  error  covariance  matrix.  In  this  open-loop 
feedback  approach,  the  fact  that  the  estimated  parameter  may  not  be  exact 
is  therefore  taken  into  consideration,  but  the  knowledge  of  future  observation 
programs  is  completely  ignored.  The  problem  when  the  system  is  linear  with 
unknown  parameters  that  belong  to  a  finite  set  has  been  studies  by  Stein  and 
Saridi8^S^  and  Lainiotis,  et  al.^*^  Their  solution  was  also  of  the  open- 
loop  feedback  type  because  it  did  not  take  into  account  the  effect  of  the 
control  on  the  future  estimation  performance. 

Yet  another  approach  is  to  approximate  the  dynamic  programming  equation. 
Murphy^^,  Gorman  and  Zaborsky^G^  used  this  approach  in  considering  the 
situation  where  the  gain  vector  is  unknown.  To  the  aughors'  knowledge,  the 
extension  of  this  approach  to  more  general  situations  is  not  found  in  the 
literature. 
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The  approach  described  in  this  section  is  based  on  the  one-step  optimal 
dual  control  theory  developed  in  Section  III,  As  we  have  noted,  ouch  a  control 
scheme  has  the  characteristic  of  appropriately  distributing  its  energy  for 
learning  and  control  purposes.  In  view  of  this,  it  is  obvious  that  the  open- 
loop  beedback  control  is,  from  the  estimation  point  of  view,  passive.  In 
contrast,  the  one-step  optimal  dual  control  ia  active,  not  only  for  the 
control  purpose  but  also  for  the  estimation  purpose,  because  the  performance 
depends  also  on  the  "quality"  of  the  estimates.  Therefore,  the  one-step 
optimal  dual  control  can  be  called  "actively  adaptive"  since  it  regulates  its 
adaptation  (learning)  in  a  systematic  maoner. 


4 . 3  The  Optimal  Cost-tp-Go  and  the  Dual  Effect 

In  this  subsection,  the  results  developed  in  Section  III  will  be 
specialized  to  the  class  of  problems  being  considered  here  to  obtain  the 
approximate  optimal  cost-to-go. 
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Let  the  present  time  be  denoted  by  k.  Given  a  point  represented  by 

the  augmented  state  z^(k+l)  -  [x^(k+l),  0^(k+l)]'  in  the  augmented  state 

space,  one  associates  with  it  a  nominal  control  sequence  denoted  by 
N-l 

{uQ[j  jz^k+l)]  Jj.k+i  •  A  nominal  trajectory  originating  from  z^Ck+l)  is 
generated  by  applying  the  above  control  sequence.  Consider  a  control  u(k) 
applied  at  time  k  and  the  resulting  predicted  state  and  covariance,  denoted 
as  z/k+l|k)  and  I/k+ljk),  respectively.  In  order  to  bring  out  the  dual  effect 
of  the  control,  assume  that  for  time  J£k+1,  a  second  order  perturbation 
analysis  will  be  carried  out  about  the  nominal  trajectory  originating  at 
z^k+1)  «  £(k+l|k)  with  a  certain  nominal  control  sequence.  The  details  on 
how  this  nominal  is  obtained  are  given  in  Section  4.4.  The  subscript  "o"  is 
used  to  denote  both  "nominal"  control  {u  [j;  i (k+1 | k) ] .  and  the  associated 

a  ft-'  ^  j*kt1 

nominal  trajectory  { zq [ j ;  £(k+l | k) ] } ?”  fc+1*  In  this  manner,  one  obtains  an 
approximate  optimal  "cost-to-go"  I*[£(k+l|k),  £(k+l|k) ,k+l]  associated  with 
z/k+l|k)  and  ,£(k+l|k),  which  is  a  function  of  u(k).  This  cost  reflecta  both 
the  future  estimation  performance  and  control  performance.  The  minimization 
of  this  cost  yields  u*(k)  and  the  procedure  is  repeated  at  every  step. 
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Assume  that  the  one-step  prediction  z^k+ljk)  and  the  associated  error 
covariance 


E_(k+1 1  k)  -  cov{z(k+l)|k) 


(4.6) 


t 


have  been  obtained  (using,  e.g.,  a  second  order  filter)  when  a  certain  control 
u(k)  is  applied  to  the  system.  Let  (z  (j)}?  ...  be  the  nominal  trajectory 
obtained  by  applying  the  nominal  control  sequence  {uQ[j;  £(k+l | k) ] 
to  the  deterministic  part  of  the  system  (4.1),  iv«. , 


Vj+i)' 

■  JL(j)  “ 

r  *  n 
£«>, 

A 

m 

’400)zo0)+io0)uo0)' 

8,0+1) 

•  — 

4o)  _ 

20)8,0) 

(4.7) 
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where  superscripts  denote  matrix  partitions  and 

VJ)  4AU.e„0)) 

2,0*  4  !>(),.%())! 


(4.8) 


with  initial  condition  2^  (ki-1)  «£(k+l|k).  For  simplicity,  the  dependence 
on  £(k+l|k)  will  be  suppressed  and  uQ[j,  £(k+l|k)]  denoted  by  UQ(j). 


Define  the  Jacobian 


4,z(^<7z  £•}' 


/ 

•-  <■ 

1 
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L 

A  1 

n  F 

X 

—  -0 

) 

79 

V, 

•  * “  « 

[fx  r  ) 


X*«> 

4.e«>' 

a 

S  -i-’  4 u) + uo(j>  vj) 

4,e«>. 

0 

m 

20) 

.(4.9) 


*  Since  9.  enters  linearly  in  A,  C  and  Is,  their  partials  with  respect  to 
9  are  constants. 


The  measurement  vector  in  (4.1)  can  be  written  in  terms  of  the  augmented  state 


h(.i)  -  [c(j,e(j»  j  0]  z(j) 


(4.9a) 


and  its  Jacobian  evaluated  along  the  nominal  is 


M 

*o,2  ”  [£o<J>!  Si  s'  4  1  »  4  £0i  «o«» 

—  «  i“l  — 


(4.9b) 
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where  a  i-l,...,n  and  £  i*l,*..,m  are  the  corresponding  rows  of 

A  and  C,  respectively.  Similarly, 


riLO) 

f  (i)  -  ^ 


(4.10) 


and  b1  will  denote  the  corresponding  component  of  b. 

Using  the  results  in  Section  III,  the  approximate  optimal  cost-to-go 
I*  is  given  by  (see  also  Appendix  C) 


I*[z(k+l|k),  E(k+l|k),  k+1]  -  J  (k+1)  +  g  (ktt) 

o  o 

+  ~  tr{[E(k+lJk)  -  ^(k+llk+1)]  K0(k+1) 

N  N-l 

+  Srt-r  <J>  ^<J*d)  "  SoW+ilJ+Dlj^M+D) 

J  j**k+l 


where 


(4.11) 


Jo(k+1>  -  j  fxo( N)  -  £(N))'  W(N)[x  (N)  -  £(N)] 


N-l 

I  £ 

j-k+l 


-  P0).vw(j)  (X  (j)  -  £(j)j 


+  2- A(j)(u  fj;  2(k+i(k)J}2 


(4.12) 
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The  cos*  matrices  corresponding  to  the  augmented  state  z  »  [x:_9/ }  are 
denoted  by 


W<J) 


W(j)  £ 

0-  t* 


(4.13) 


where  £  denotes  an  nXs  zero  matrix  and  K^j) ,  gQ(j)  satisfy  the  backward 
equations 


1LU) 


k”(j)  i£x(j) 

K^X(3)  K®8(j) 


(4.14) 


j£*(J)  -  A^(j)[I-V0(J)^X(j+l)bo(j)b^(J)]^X(j+DA0(j) 


+  W(j)  ;  K*X(N)«W(N) 


(4.15) 


C<J>  "  I^qO)  K^O+D  +  D'(j)  K^OfDjA^j) 


-  y0(J) 


j<i 


O)  K^tJ+l)  +  D’(j)  ^(J+Dlb^O) 


+  £  si  bg(j)  |  |b;<j)  j^cj+o  V'^j  i  C(J) 


0  (4.16) 


4e<«  ■  4x,'e«>  £t,o>  +D-(j)  k^o+i,  4i8o) 

+  ^o.i<J>  ^6<J+’‘)  2<j)  +  o'(j)  K^Cj+l)  D(J)  - 
ji'.UJQ?/  O+D^gCj)  +  Kf  <j+l)D(J)] 

+JJ  Si  4  «+1>  "s  (J)jk,(JH^'a+i)4xta(j)  +  jg#a+i>»u>j 

n 


(4.17) 


(■„«)  -  DO)  +  4(1)  £*(1+1)  b^l)]"1 
s0(D  •  80<J+U  -  U0(j)[x(j)  u00)  +  ^  (1+1)  ^U)]2! 

S0<H>  ■  0 

j£<n  -  4<i>  jJo+d  +H(j)tJSoo)  -£.<1)1  -  u0(d  a;o) 
•  £*0+1)  4<D  1X1)  u0U)  +  j£'()+l)  1,(1)]; 

400  »  WCNMXqOO  -  £.00] 


and  l  (j+l|j),  Z  (j+1 I j+1).  the  predicted  and  updated  error  covariances  of 
the  augmented  state  satisfy  the  forward  equations:  (j=k+l, • • . ,N) 


Vi+i>  ■  io<J+1ii)  s;,i<J+wt£0>£a+i)  4<J+1b>  4,z(3+1) 

+  rO+dT1;  1-k . . 

£,(J+1|J+1)  -  tI"20(J+1)iOjj<3+l)lio()+l|l)i  J-k . Ii-1 

4(1+11)) .  £,^(1)4(1  ID r>z())  +  go)  ).k+i . 


where 


0 

G(j) 


The  initial  condition  in  (4.22)  is  ^(k+1  |k)  *£(k+l |k)  t  the  extrapolation 
covariance  obtained  after  applying  u(k)  to  the  system. 


(4.18) 

(4.19) 


(4.20) 


(4.21) 

(4.22) 

(4.23) 

(4.24) 


4.4  Nominal  Selection  and  the  Computation  of  the  Otta-Step  Dual  Cost 

In  this  section,  the  appropriate  selection  of  the  nominal  control 
sequence  associated  with  each  predicted  state  will  be  discussed  along  with 
the  detailed  computation  of  the  one-step  dual  cost. 


N-l 


One  reasonable  choice  of  the  nominal  control  sequence  uQ[j;  z/k+1  k)] 
would  be  the  certainty  equivalence  control;  i.e.,  this  sequence  is  obtained  by 
solving  the  problem  of  minimizing 


J0(k+1)  -  -fx^N)  -  o(N)]’  W(N)[x  (N)  -  £(N)] 


1  N~1 

+  2  £_  {teo(j)  “  fiXJ)]*  W(j)[x^(j)  -  b(j)]  +  X(j)[u  (j)]2}  (4.25) 

j"k+l 

subject  to  the  constraints: 

Xo<j+l)  -  A[j;0o(j)]  x.o(j)  +  bf j;  Bjj)]  UQ(j);  x^k+1)  -  x(k+l|k)  (4.26) 


®  O+D  -  D(j)  £  (j)  ;  £  (k+1)  -  £(k+l|k) 


(4.27) 


Note  that  ©^(j),  j^k+l, • • • ,N  can  be  computed  independently  of  how  the  control 
uQ(j)  is  selected.  The  solution  for  this  optimization  problem  can  be  obtained 
easily. The  optimal  control  u*(j)  is  given  by 


U0(J)  -  -  ~0(j)  ytj)  [^(j+1)  A^j)  zo<j)  +^0+1)] 


where 


Vj>  -  +K(v  5^+1)  ^vr1 

and  ^  (j+1)  (j+1)  satisfy^ 

%>(V  -  Aj(j)[i  -  51,(3)  5,0+1)  b^j)  y(j)l  5,0+1) 4,(3) 


(4.28) 


(4.29) 


+  W(J)  ;  KjCN)  -  W(N) 


(4.30) 


t  The  squiggle  here  denotes  quantities  related  to  the  certainty  equivalence 
control  which  determines  the  nominal  trajectory  from  k+1  to  N. 


-vr- 


-  Aj(j)u  -  \a>  40+1)  vj) 


~£CJ)jo(j)  J  ^(N)  -  -  W(N)  £  (n) 


(4.31) 


The  corresponding  minimum  cost  Is 


J0(k+1)  -  |  xe  (k+l|k;  K^k+1)  x(k+l|k)  +^>'(k+l)  x(k+l|k)  +  g0(k+l)  (4.32) 


where  gQ(j)  satisfies 


?0(j>  -  g0(J+D  -  \  VQ(i)  $J(j+l)  b^j)  b'(J)  ^(j+1) 

+  |£.'(j)  W(j)  £(j)  ;  go(N)  -  J£.’(N)  W(N)£(N)  . 


(4.33) 


By  comparing  (4.30)  with  (4.15),  we  see  that 


*£*(;{)-  Kc^j)  j-k+l,...,N 


(4.34) 


and  hence  from  (4.18)  and  (4.29) 


V'j)  “  **0(J) 


(4.35) 


It  is  shown  in  Appendix  D  that 


#J>  -?,(3)  5o<J>  +  *o(J) 


(4.36) 


From  (4.35)  and  (4.28),  we  have 


»0>  «*0)  +  4  O+i)  4-0>  -  iO)  «*0)  +  40)  a;o>  40+D  40) 


+  u0o)  &;o>  40+D  4a)  +  ijo+D  40)  -  o 


Therefore  (4.19)  becomes 


80(J>  "  0  J  j“k+l, . . . ,N 


(4.37) 


(4.38) 
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If  we  do  not  discretize  the  predicted  state,  the  one-step  cost  can  be  computed 
by  the  following  procedures: 


i.  Obtain  x(k+l|k),  Ojk+llfc),  and  E(k+l|k)  by 


x(k+l|k)  «  A[k;  ©_(1c |k) ]  x(k|k)  +  bjk;  £(k  jk)]  u(k) 


+  f  E  %  trtf1  [£(k|k),  u(k)lf(k|k)} 
i“l  —  2L  . 


(4.39) 


£(k+l|k)  -  D(k)  £(k|k) 


(4.40) 


2,(k+l|  k>  -  l  z  (k)  E  (k|k)  f'  (k)  +  ^(k) 


n+s  n+s 


+  fZ  E  (k)  £  (k>k)f3Z  Z  <!‘)£(k|k)l 

i“l  J-l 


(4.41) 


2.  Generate  6  (j)*  j>k+l  via  the  equation  (4.27). 

"V 


3.  Compute  ^(j)  >  £0(j) »  j**k+l,...,N  using  (4.30)  and  (4.31).  Note 
that  these  equations  are  a  function  of  £(k|k)  only,  and  are 
independent  of  u(k). 


4.  Generate  x^j),  j»k+l,...,N  using  (4.26)  (with  uQ(j)  «  u*(j))  and 
(4.28). 


Ox  0  0 

5.  Compute  (j),  (j),  j«k+l, . . . ,N  by  (4.16)  and  (4.17).  These 


are  backward  equations. 


6.  Form  the  matrix  K  (j),  k+l,...N,  using  (4.14)  and  (4.34). 


7.  Compute  E  (j+l|j+l),  j-k,...,N-l,  E  (  j+1  j  j ) ,  j»k+l, . . .  ,N-1, 

•*“0  ““O 


using  (4. 21)- (4. 24) , 
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8.  Obtain  the  one-step  dual  cost  by 


Jj[u(k>l  «  |x(k)  u2(k)  +  |*'(k+l|k)  K^k+l)  x(k+l|k) 


+  2' (k+1)  x (k+1 j k) 


+  |tt 


N-l 

E  H(j)Ejx(j|j) 

j**k+l 


+  IJL  (k+1 | k)  -  E  (k+1 j k+1)] K  (k+1) 


N-l 


+  E  Vj+1) 

j-k+1 


(4.42) 


4.5  Remarks 

1.  The  minimization  of  (4.42)  is  done  by  performing  a  search  for 
u*(k).  Since  u  is  scalar,  we  can  use  the  quadratic  fit 
optimization  method  described  in  Appendix  B  , 

2.  The  dual  property  of  the  control  is  revealed  in  (4.42)  where 
the  one-step  cost  to  be  minimized  includes  both  control  and 
estimation  cost. 

3.  Let  us  partition  the  error  covariance  E 


-0 

Ex0' 

0 

z0K 

z" 

L—o 

—0  J 

Then  if 

— o 

for  large  k 


E^(k+l|k+l)  *£,  we  must  also  have  Ex^  (k+l|k+l)  *()  and 


(4.43) 


the  one-step  dual  cost  becomes 
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Jdlu(lt)]  -  |h(k)  u2(k)  +|  x'  (fcfljk)  f^(fcfl)  £(k+l|k) 

(  N 

+  2J(k4-l)  x(k+ljk)  +  j  tr  I  ]£  ^  W(j)  E“(j  |  j) 

+  ^(k+Jf^Ck+ljk)  -  ]£*(kHL|k+l)J 

+  ^  g&VlS  d+l|3>  -  iT  (3+1 1 3+1) ]|  .  (4.44) 

Since  if  Z^8 (k+1 1 k+1)  -  0 , 

2li(k|k),  u(k)J  Z(k|k)  »  0  <4-45> 

we  have  from  (4.39) 

x(k+l|k)  -  A[k,  £(k|k)J  x(k|k)  +  blk.,  £(k(k)3  u(k)  .  (4.46) 

Also,  one  can  easily  show  from  (4.21)-(4.24)  that  j|^x(j+l  |  j+1) , 
j=k,...,N-l,  satisfy  the  minimum  error  equation  of  the  linear 
system  with  known  parameters  6,(j)  *J,(j  |k).  These  imply  that  if 
we  have  high  confidence  on  the  parameter  estimate,  we  can  assume 
separation  to  hold. 

4.  The  one-step  dual  cost  reflects  also  the  effect  of  the  future  obser¬ 
vation  program.  For  example,  if  it  is  known  a  priori  that  during  the 
interval  £<j£N,  Aik,  no  observations  will  be  made,  then  we  would 
have  E  (j  lj-1)  M  E  (jlj).  In  this  case  J^[u(k)]  becomes 

— “O  “”0  Q 
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Jd[u(k)]  -  |x(k)  u2(k)  +  |*(kfl|k)  K^k+l)  k(k+l  |k) 

+  %(k+l)  *<k+l|k)  +  j  tr{  W(j)  ^(j+ll  j+1) 

j-k+1 

N 

+^w(3)  ^(jU) 

+  (k+l) [I(k+1 1 k)  -  ^ (k+1 1 k+1) ] 

+£  Kb(J+1)Il0(i+1b>  “  Io(j+1lj+1>1}  (4,/'7) 

j-k+2 

Therefore,  tne  knowledge  that  future  observations  will  or  will  not 
be  taken  would  change  the  present  control  strategy.  If  future  learning 
will  not  take  place,  the  present  control  tries  to  minimize  the  average 
control  performance,  whereas  if  future  observation  will  take  place,  the 
present  control  will  invent  some  of  its  energy  to  help  the  future 
learning.  It  is  in  this  way  that  the  dual  control  regulates  its 
future  learning  under  sotue  control  objective.  Because  of  this  "active 
learning"  characteristic  we  call  this  control  strategy  an  actively 
adaptive  control. 

5.  The  estimation  cost  of  (4.42)  is  also  a  function  of  time-to-go.  In 
the  beginning  of  the  control  interval,  the  estimation  cost  is  rel¬ 
atively  high.  The  one-step  optimal  dual  control  must  therefore  be 
selected  so  that  it  compromises  between  control  and  estimation  purposes. 
When  k  is  approaching  N-l,  the  estimation  cost  becomes  smaller,  and 
thus  the  one-step  optimal  dual  control  will  give  less  weight  to  the 
estimation  part  and  will  finally  concentrate  on  the  control  purpose. 
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For  the  case  where  the  control  £(k)  is  a  vector  rather  than  a 
scalar  value,  one  can  obtain  exactly  the  same  equations  as  above 

r\j 

except  that  now  yQ(j)»  UQ(j)  are  matrices  and  care  is  required 
in  their  placement  in  Equations  (4.30)-(4.33)  and  (4 . 15)— (4 . 20) . 

In  the  vector  control  case,  the  search  for  the  one-step  optimal 
is  more  complicated  since  we  are  searching  over  a  volume  rather 
than  a  line.  Conceptually  this  does  not  create  any  new  difficulty. 


f  t 
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V.  SIMULATION  STUDIES 

In  this  section,  three  example  problems  pertaining  to  dual  control  are 
considered,  the  purposes  o£  which  are: 

(1)  To  investigate  the  computational  feasibility  of  the  one-step 
optimal  dual  control  algorithm. 

(2)  To  compare  this  algorithm  with  another  widely  used  suboptimal 
algorithm  —  the  certainty  equivalence. 

(3)  To  understand  the  dual  nature  of  the  proposed  algorithm;  in 
particular,  to  understand  the  learning  purpose  of  the  control. 

The  first  —  the  scalar  case  example  —  will  be  a  simple  one,  so  that  we  may 
understand  the  implications  more  clearly.  The  other  two  —  on  interception 
and  soft  landing  —  will  be  more  complicated  and  will  give  additional  insight 
into  the  dual  control  and  some  indication  as  to  the  computation  feasibility 
of  the  proposed  algorithm. 

5.1  Scalar  Case  Example 

Consider  a  scalar  linear  system 

x(k+l)  »  ax(k)  +  bu(k)  +  £(k) 


y(k)  »  x(k)  +  *?(k) 


(5.1) 


where  a,b  are  unknown  constants  and  w(k) ,  v(k)  are  independent  zero-mean 
white  noises  with  covariances  q  and  r,  respectively.  The  problem  is  to  find 
a  control  sequence  {u*(k)}^Q  such  that  the  performance 


N*1 

J  »  E  {|  (x(N)  -  p]2  +  |  Z  u2(k)} 


c  >  0 


(5.2) 
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is  minimized  subject  to  the  constraint  (5.1)  and 

u* (k)  -  u*(Yk,  Uk_1,k)  .  (5.3) 

A  comparison  of  the  certainty  equivalence  (C.E.)  control  strategy,  and 
the  actively  adaptive  dual  control  strategy  as  described  in  Section  IV  will 
be  illustrated. 


Two  cases  are  considered.  For  both  cases,  N-20,  p“5,  C-100, 
x(0)»C,  a  =  0.8,  b“0.5,  q-0.25,  r»0.04;  the  initial  guesses  are  x(0|0)“0.13, 
a(0|0)  ■  1.2,  6(0 1 0)  “  0.3  with  initial  error  covariance 


(5.4) 


In  case  1,  the  observations  are  available  for  all  k « 1,2, • • • ,19;  in  case  2, 
the  observations  are  available  at  k“  !,•♦.,  14;  for  k*15,  no  observation  is 
available.  It  is  impossible  to  see  how  close  the  dual  control  strategy 
performance  ic  to  that  of  the  truly  optimum  control  strategy,  since  the 
truly  optimum  control  strategy  is  very  difficult  to  obtain.  To  give  an 
idea  about  the  performance  level  of  the  dual  control  strategy,  we  shall 
include  the  results  for  the  optimal  control  when  the  parameters  are  all 
known.  The  performance  for  this  will  serve  as  a  lower  bound.  It  must  be 
kept  in  mind  that  this  lower  bound  is  not  achievable  even  by  the  truly 
optimal  stochastic  control  for  our  problem.  Ten  Monte  Carlo  runs  were  per¬ 
formed  for  both  cases,  the  results  of  which  are  shown  in  Tables  5.1  and  5.2. 
The  first  column  shows  the  results  for  the  optimum  control  when  the  para¬ 
meters  are  known. 


From  Table  5.1,  we  see  that,  on  the  average,  the  dual  control  is  better 
than  is  the  C.E.  control.  An  important  fact  here  is  that  the  dual  control 
performance  has  a  relatively  small  deviation  from  its  average  performance 
compared  with  that  of  the  C.E.  control.  This  property  indicates  that  the 
dual  control  is  more  reliable  than  is  C.E.  control  under  stochastic  effects. 
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TABLE  5.1 

COMPARISON  OF  DUAL  CONTROL  WITH  C.E. 
CONTROL  FOR  THE  SCALAR  EXAMPLE  (CASE  1) 


AVERAGE 
MISS  DISTANCE 
SQUARED 

AVERAGE 
TERMINAL  ERROR 
SQUARED  IN  a 

AVERAGE 

TERMINAL  ERROR 
SQUARED  IN  b 

AVERAGE 

PERFORMANCE 

RANGE 

OF 

PERFORMANCE 

STANDARD 

DEVIATION 

OF 

PERFORMANCE 

OPTIMUM 

0.0653 

0 

0 

20.7 

15.71-36.34 

6 

C.E. 

0.311 

0.126 

0.233 

34.7 

22.11-70.43 

17 

DUAL 

0.219 

0.125 

0.228 

32.0 

22.04-48.40 

10 

TABLE  5.2 

COMPARISON  OF  DUAL  CONTROL  WITH  C.E. 
CONTROL  FOR  THE  SCALAR  EXAMPLE  (CASE  2) 


AVERAGE 
MISS  DISTANCE 
SQUARED 

AVERAGE 
TERMINAL  ERROR 
SQUARED  IN  a 

AVERAGE 

TERMINAL  ERROR 
SQUARED  IN  b 

AVERAGE 

PERFORMANCE 

RANGE 

OF 

PERFORMANCE 

STANDARD 

DEVIATION 

OF 

PERFORMANCE 

OPTIMUM 

0.097 

0 

0 

22.3 

17.66-43.26 

7 

C.E. 

7.353 

1.609 

.650 

308.6 

39.31-1882.5 

561 

DUAL 

.143 

.282 

.258 

74.5 

66.6-113.36 

13 

FIGURE  5.2  LEARNING  HISTORY  FOR  DUAL  CONTROL  AND 
C.E.  CONTROL  (ONE  SAMPLE  RUN  FOR  CASE 


FIGURE  5.3  DUAL  CONTROL  HISTORY  FOR  CASE  1 
'  AND  CASE  2  (ONE  SAMPLE  RUN) 


FIGURE  5/4  EVOLUTION  OF  THE  PARAMETER  ESTIMATES 

OBTAINED  BY  APPLYING  THE  ONE-STEP  DUAL 
CONTROL  (CASE  1  AND  CASE  2;  ONE  SAMPLE  RUN) 


From  the  estimation  performance,  both  the  dual  control  and  C.E.  control 
perform  well  In  terms  of  estimation  at  the  terminal  time,  as  shown  in  Figs. 
5.1  and  5.2.  Thus  the  fact  that  dual  control  performs  better  than  C.E. 
control  must  be  related  to  how  fast  learning  is  being  performed.  In  Fig. 
5.1,  the  control  histories  are  plotted  for  one  particular  sample  run.  In 
this  sample  run,  the  noise  sequences  are  the  same  for  the  optimal  with 
known  parameters,  C.E.,  and  dual  controls.  If  the  parameters  are  known 
exactly,  learning  is  obviously  not  required  and  thus  the  control  action  will 
have  only  the  control  objective.  Notice  that  to  achieve  this  objective, 
the  control  energy  should  be  kept  small  in  the  beginning  and  become  larger 
toward  the  terminal  time.  In  general,  the  C.E.  control  has  this  character¬ 
istic  (the  overshoots  at  about  10  and  13  are  due  to  stochastic  effects). 
However,  the  dual  control  acts  quite  differently,  namely,  at  the  initial 
time,  the  control  value  is  quite  far  from  zero.  Thus,  the  dual  control 
allocates  some  energy  which  is  not  directly  intended  for  the  control 
objective  in  the  beginning. 

In  Fig.  5.2,  the  evolution  of  the  parameters  estimated  is  plotted  for 
one  sample  run.  As  we  notice  in  the  figure,  this  energy  is  utilized  for  the 
learning  purpose,  which  indicates  that  in  the  initial  period,  achieving  the 
control  objective  and  learning  are  in  conflict.  For  k*12,  the  control 
energy  is  building  up  in  order  to  achieve  the  control  objective.  Since 
large  control  energy  will  excite  the  modes  and  improve  the  signal  to  noise 
ratio,  it  will  promote  learning.  Thus  for  k2*12,  learning  and  controlling 
are  not  in  conflict.  This  explains  why  the  C.E.  control  does  have  good 
estimates  at  the  terminal  time.  In  this  case,  learning  is  "accidental." 

To  illustrate  this  point,  we  will  see  what  happens  if  learning  is  not 
possible  in  the  final  period.  This  is  shown  in  case  2.  Table  5.2  shows  the 
results  of  artificially  terminating  the  final  learning  period.  The  C.E. 
control  does  very  poorly  in  estimation,  and,  consequently,  very  poorly  in 
achieving  the  control  objective.  This  is  reflected  by  the  very  large 
average  cost  and  its  standard  deviation.  The  dual  control,  on  the  other 
hand,  still  performs  reasonably  well  due  to  anticipation  of  the  open-loop 
period  at  the  end. 


In  Fig,  5.3,  the  control  histories  for  another  sample  run  with  identical 
noise  sequence  for  dual  control  case  1  and  case  2  are  plotted.  Note  that  at 
the  initial  time,  more  energy  is  allocated  to  learning  in  case  2  than  in 
case  1.  This  is  so  because  in  case  2,  the  derivation  of  the  dual  control 
takes  into  account  that  no  learning  is  possible  for  k-15,  and  thus  any 
large  control  at  the  end  will  not  help  in  learning;  therefore,  in  order  to 
achieve  good  control  performance,  a  large  amount  of  energy  must  be  invested 
for  pure  learning  purposes  during  the  initial  period  to  excite  the  system 
and  to  improve  the  signal-to-noise  ratio.  This  is  illustrated  by  Figs.  5.3, 
5.4  and  5.5.  As  a  result,  the  dual  control  achieves  a  much  lower  average 
cost  and,  at  the  same  time,  a  much  more  reliable  control  strategy  than  does 
the  C.E.  control. 

This  active  learning  characteristic  is  a  distinguishing  feature  of  the 
dual  control  strategy,  which  depends  not  only  on  past  observation  infor¬ 
mation  but  also  on  the  future  observation  program;  therefore,  the  control 
value  will  differ  depending  on  whether  or  not  future  observations  will  be 
made.  Note  that  such  a  feature  is  not  possessed  by  any  of  the  existing 
suboptiraal  schemes  suggested  in  the  literature. 

In  the  scalar  example,  a  second  order  filter  is  used  for  on-line 

estimation  of  state  and  parameters.  This  estimation  scheme  is  quite 

effective.  Clearly,  one  may  expect  better  performance  if  one  uses  a  more 

FB31  rT3]  FA31 

sophisticated  estimation  algorithm;  e.g.,  via  parallel  filters  * 

One  important  point  to  be  stressed  is  that  the  dual  control  strategy 
tries  to  improve  the  performance  by  considering  what  should  be  done 
before  as  well  as  after  the  parameters  are  identified,  whereas  the  C.E. 
control  strategy  only  tells  what  should  be  done  after  the  parameters  are 
identified. 

5.2  Interception  Example 

In  this  subsection,  the  interception  problem  will  be  investigated. 
Consider  a  third  order  system 

x(k+l)  =  A(ei,02,63)x(k)  +B(e4,05,06)u(k)  +£<k>  (5 

y(k)  =[0  0  l]x(k)+n(k) 


50 


where 


0  10 

1 

a> 

_ 1 

A(e1,e2,e3)  « 

0  0  1 

61  62  03 
»  - 

;  B(64,65,06)  - 

65 

86 

mm 

(5.6) 


and  {0.}^  are  unknown  constant  parameters  with  normal  a  priori  statistics 
having  mean  and  variance 


6(0 | o)  »  [1.,  -.6,  .3,  .1,  .7,  1.5]'  , 

i00(o|o)  -  diag(.l,  .1,  .01,  .01,  .01,  .1) 


The  true  parameters  are 

0  -  [1.8  -1.01,  .58,  .3,  .5,  1.]'  .  (5.7) 

The  initial  state  is  assumed  to  be  known: 

£(0|0)  -  x(0)  -  0  .  (5.8) 

The  objective  is  to  bring  the  third  component  of  the  state  to  a  desired  value. 
This  is  expressed  by  the  cost 

J  •=  f  EUx-OO-pJ*  +  l  X  u  Z(i)}  (5.9) 

L  J  i-0 

where  p  is  some  value  and  X  is  chosen  to  be  small.  In  our  example  p*=20  and  a  Is 

3 

chosen  to  be  10"3.  The  noises  {^(k)}^  and  n(k+l)  are  assumed  to  be 
independent  and  are  normally  distributed  with  zero  mean  and  unit  variance. 

If  we  interpret  x^  as  the  position  of  an  object,  then  this  example  corresponds 
to  an  interception  problem:  the  guidance  of  an  object  to  reach  a  certain 
point,  without  constraints  on  the  velocity  and  acceleration  of  the  object 
when  it  reaches  that  point.  The  difficulty  lies  in  the  fact  that  the 


poles  and  zeroes  of  the  system  are  both  unknown.  The  initial  condition  (5.7) 
represents  the  fact  that  the  system  is  initially  at  rest. 

Twenty  Monte  Carlo  runs  were  performed  on  the  interception  example  and 
average  performances  are  summarized  in  Table  5.3  and  Figs.  5. 6-5. 8.  The 
performance  for  the  optimal  control  when  all  the  parameters  are  known  is 
included  to  serve  as  a  lower  bound  for  the  truly  optimum  performance  for 
this  problem.  Again,  it  should  be  emphasized  that  this  lower  bound  is 
unachievable  e  re n  by  the  optimum  stochastic  controller  for  the  system  with 
unknown  parameters. 

As  shown  in  Table  5.3,  the  dual  control  performance  is  an  order  of 
magnitude  better  than  the  C.E.  control.  The  second  and  third  rows  indicate 
that  the  dual  control  performance  is  highly  predictable,  compared  with  the 
C.E.  control.  Note  that  the  dual  control  uses  only  about  twice  the  energy 
of  the  C.E.  control,  at  the  same  time  achieving  a  dramatic  improvement  in 
the  miss  distance  squared  over  the  C.E.  control.  This  indicates  that  the 
dual  control  does  use  control  energy  at  approrpiate  times  to  improve  learning, 
and  thus  achieves  a  satisfactory  control  objective. 

Note  that  in  Fig.  5.8,  the  dual  control  invested  at  the  beginning 
considerable  energy  in  learning.  The  effect  of  this  is  revealed  in  Figs.  5.6 
and  5.7,  where  the  average  error  squared  for  the  parameters’  estimates  are 
displayed.  Note  that  the  learning  in  0^,  0,.,  and  0g  is  much  faster  than  the 
learning  in  0^,  0^,  and  0^.  As  discussed  in  Section  3.4,  the  learning  of 
0^,  0y  and  0^  results  from  the  fact  that  large  control  will  improve  the 
signal-to-noise  ratio  for  these  parameters  and  thus  the  control  can  help 
in  learning  them  in  one  step;  on  the  other  hand,  the  learning  of  0^,  ©£» 
and  0^  is  accomplished  by  exciting  the  modes  of  the  system,  and  thus 
learning  would  be  delayed  until  the  system  is  properly  excited. 

Note  that  the  C.E.  control  provides  fairly  good  learning  in  0^,  09, 
and  0^,  but  practically  no  learning  in  0^,  0^,  and  0g.  Note  also  that  the 
C.E.  control  builds  up  energy  very  quickly  after  the  tenth  step.  As  observed 
in  Fig.  5.7,  some  learning  is  0^,  0^,  and  0g  is  performed  for  k  -10,  but 


52 


TABLE  5.3 


SUMMARY  OF  RESULTS  FOR  THE  INTERCEPTION  EXAMPLE 


CONTROL 

POLICY 

OPTIMAL  CONTROL 
WITH 

KNOWN  PARAMETERS 

C.E.  CONTROL 
WITH 

UNKNOWN  PARAMETERS 

DUAL  CONTROL  1 

WITH 

UNKNOWN  PARAMETERS 

AVERAGE 

COST 

6 

114 

14 

MAXIMUM  COST 

IN  A  SAMPLE  OF 
TWENTY  RUNS 

20 

458 

53 

STANDARD  DEVIATION 
OF  THE  COST 

6 

140 

16 

EXPECTED  MISS 
DISTANCE  SQUARED 

12 

225 

22 

WEIGHTED 
CUMULATIVE 
CONTROL  ENERGY 
PRIOR  TO 

FINAL  STAGE 

.1 

1.4 

3.2 

FIGURE  5.7 


AVERAGE  ESTIMATION  ERROR  SQUARED  IN  04,  65#  eg 
FOR  THE  INTERCEPTION  EXAMPLE 


prior  to  k»10,  practically  no  control  is  applied  and  thus  in  0^,  0^,  and 
0g  no  learning  is  done.  The  learning  in  6^,  and  0^  before  k ■  10  is  due 
to  the  process  noise,  which  serves  as  a  random  input  that  excites  the  modes 
of  the  system.  Thus  in  this  case,  the  learning  is  0^,  Og,  and  0^  is  quite 
accidental;  also  because  this  learning  is  too  slow,  it  is  of  little  use  in 
achieving  the  control  objective. 

5.3  Soft  Landing  Example 

Consider  the  same  ^yetem  with  the  same  a  priori  conditions  as  discussed 
in  Section  5.2.  The  only  difference  is  that  instead  of  bringing  only  the 
third  component  of  the  state  to  a  desired  value,  the  objective  is  to  bring 
the  final  state  to  a  certain  point  in  the  state  space.  This  is  expressed  by 


N-l 


J  “  4  E([x(N)  -  £]'  [x(N)  -  £]  +  l  \  u2(i)} 

i“0 


(5.10) 


where  £  is  a  point  in  RJ  and  \  is  chosen  to  be  small.  This  may  be  interpreted 
as  a  soft  landing  problem  by  selecting  the  £  vector  to  be 


O' 

0 

20 


(5.11) 


and  X“10“3,  Comparing  the  results  of  this  problem  to  those  obtained  in 
Section  5.2  will  provide  more  insight  into  the  dual  nature  of  the  control. 
Twenty  Monte  Carlo  runs  were  carried  out  for  the  C.E.  control,  the  dual 
control,  and  the  optimal  control  with  known  parameters.  Again,  the  last- 
mentioned  serves  as  an  unachievable  lower  bound  to  the  optimum  performance. 
The  results  are  summarized  in  Table  5,4  and  Figs.  5.9-5.11. 

Conceptually,  the  soft  landing  is  a  "harder”  problem  than  the  one  con¬ 
sidered  in  Section  5.2.  Here,  we  want  to  "hit"  a  point  in  the  state  space, 
while  in  Section  5.2  we  wanted  to  "hit"  a  surface.  Therefore,  it  should  be 
expected  that  the  average  cost  is  higher  than  in  the  previous  example. 
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TABLE  5.4 


SUMMARY  OF  RESULTS  FOR  THE  SOFT  LANDING  EXAMPLE 


CONTROL 

POLICY 

OPTIMAL  CONTROL 
WITH 

KNOWN  PARAMETERS 

C.E.  CONTROL 
WITH 

UNKNOWN  PARAMETERS 

DUAL  CONTROL 

WITH 

UNKNOWN  PARAMETERS 

AVERAGE 

COST 

15 

104 

28 

MAXIMUM  COST 

IN  A  SAMPLE  OF 
TWENTY  RUNS 

35 

445 

62 

STANDARD  DEVIATION 
OF  THE  COST 

9 

114 

11 

EXPECTED  MISS 
DISTANCE  SQUARED 

28 

192 

32 

WEIGHTED 
CUMULATIVE 
CONTROL  ENERGY 
PRIOR  TO 

FINAL  STAGE 

1 

7 

12 

FIGURE  5.10  AVERAGE  ESTIMATION  ERROR  SQUARED  IN  04,  e&,  6g 
FOR  THE  SOFT  LANDING  EXAMPLE 
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This  is  seen  to  hold  true,  as  shown  in  Tables  5.3  and  5.4,  for  the  dual 
control  and  the  optimal  control  with  known  parameters.  However,  for  C.E. 
control,  it  does  not  hold  true.  This  may  look  strange  at  the  first  sight, 
but  careful  analysis  of  the  simulation  results  will  offer  an  explanation 
for  this. 

In  the  following,  the  results  of  this  example  are  examined  in  more 
detail,  later,  the  comparisons  of  this  example  and  that  described  in  Section 
5.2  are  made. 

Table  5.4  indicates  the  improvement  of  dual  control  over  C.E.  control, 
both  in  average  performance  and  reliability.  The  terminal  miss  distance 
squared  for  the  dual  control  is  very  close  to  the  unachievable  lower  bound 
given  by  the  optimal  contril  with  known  parameters.  To  achieve  this  small 
miss  distance,  the  dual  control  invests  considerable  energy  for  learning 
purposes.  This  can  been  seen  in  Fig.  5.11  where  it  is  shown  that  a  large 
amount  of  energy  is  invested  at  the  initial  time  to  promote  future  learning. 

As  a  result,  the  parameters  are  estimated  very  quickly  (in  about  eight  steps). 
After  the  parameters  are  adequately  learned,  the  dual  control  smoothly  hits 
the  final  point  £  <see  Fig.  5.11).  Again,  note  the  delay  in  learning  the 
parameters  0^  02>  an(* 

The  C.E.  control,  on  the  other  ha;.::,  being  only  passive  in  learning, 
learns  much  slower,  with  the  result  that  the  terminal  error  is  an  order  of 
magnitude  higher  than  that  of  the  dual  control.  As  a  consequence,  the  miss 
distance  squared  is  substantially  larger  than  that  of  the  dual  control.  The 
C.E.  control  learning  in  0^,  @2,  and  is  enhanced  by  the  process  noise, 
whereas  the  learning  in  0^,  0,.,  and  0g  is  regulated  by  the  control.  In  the 
C.E.  case,  this  is  very  small  in  the  initial  period  and  builds  up  very 
quickly  after  time  eight.  Notice  in  Fig.  5.10  that  the  C.E.  control  did 
quite  a  bit  of  learning  after  time  eight,  but  this  learning  is  passive. 

To  understand  the  passive  and  active  learning  of  the  C.E.  and  dual 
control,  the  results  of  the  soft  landing  example  and  the  previous  example 
will  be  compared.  First  compare  the  two  C.E.  controls.  Note  that  the  C.E. 
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control  energy  used  in  the  soft  landing  example  (we  shall  call  this  the 
second  example)  is  much  more  than  that  used  in  the  interception  example  (we 
shall  call  this  the  first  example).  Note  from  Figs.  5.9  and  5.11  that  up  to 
about  k*=12,  the  C.E.  control  uses  about  the  same  cumulative  energy 
for  the  two  examples.  The  fact  that  the  final  mission  is  different  has  not 
yet  become  important  enough  to  change  the  control  strategy.  As  a  consequence, 
the  learning  for  both  cases  is  almost  the  same  up  to  this  time.  In  the  first 
example,  since  the  final  destination  is  a  surface,  the  controller  can  wait 
almost  until  the  final  time  to  apply  a  control  to  achieve  the  control  objective, 
and  therefore  the  C.E.  control  is  still  applying  little  energy  after  time 
twelve.  The  learning  of  the  parameters  9^,  0^,  and  6g  is  only  slightly 
improved.  However,  for  the  second  example,  since  the  final  destination  is  a 
point  in  the  state  space,  the  control  must  work  "harder"  to  achieve  its 
objective  (transferring  from  one  point  to  another  arbitrary  point  requires 
three  time  units).  Therefore,  the  control  energy  after  time  twelve  increases 
very  quickly  for  the  second  example.  This  results  in  a  much  better  estimation 
on  the  gain  parameters.  Since  the  learning  in  the  first  example  is  poorer  than 
in  the  second  example  for  the  C.E.  control,  a  higher  cost  is  accrued  in  the 
first  example  than  in  the  second.  Note  that  even  though  the  second  example  is 
a  "harder"  problem,  a  better  performance  value  is  obtained.  This  is  primarily 
because  "accidental"  learning  is  enhanced  by  the  difficulty  of  achieving  the 
final  mission. 

For  dual  control,  quite  a  different  control  strategy  at  the  beginning 
rather  than  at  the  end  of  the  control  interval  can  be  noticed.  The  fact 
that  a  different  end  condition  has  to  be  fulfilled  is  propagated  from  the 
final  time  to  the  initial  time.  For  the  second  example,  the  dual  controller, 
realizing  that  the  final  mission  is  much  more  difficult  to  achieve,  decides 
to  invest  more  energy  in  the  beginning,  because  learning  is  very  important  in 
this  case  to  achieve  a  satisfactory  final  objective.  Note  the  "spaed"  of 
learning  in  the  second  example  compared  with  the  first  example  (see  Figs.  5.6, 
5.7,  5.9,  5.10).  The  dual  control  regulates  its  energy  in  learning:  in  the 
first  example  where  learning  is  less  important,  it  does  not  insist  in  learning 
by  applying  large  controls  in  the  beginning;  in  the  second  example,  the 
learning  is  much  more  important  and  thus  more  energy  is  utilized  for  the  learn¬ 
ing  purpose.  For  both  examples,  the  expected  miss  distances  squared  are 
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comparable,  thus,  the  lncreeee  In  coat  In  the  Interception  example  li  primarily 
due  to  the  Increase  In  accumulative  Input  energy.  This  demonstrates  the  active 
learning  characteristic  of  the  dual  control. 

5.4  Remarks 

(1)  A  comparison  of  the  computation  time  required  by  the  dual  control 
with  that  for  C.E.  control  gives  some  Idea  of  the  computation 
feasibility  of  the  proposed  algorithm.  For  the  scalar  example,  the 
dual  control  requires,  on  the  average,  about  twice  as  much  time  as 
the  C.E.  control  per  time  unit.  Note  that  In  this  example,  we 
actually  have  a  3-dimanaional  problem.  For  the  other  two  examples. 

It  was  found  that  the  computation  time  for  the  dual  control  Is  on 
the  average,  approximately  seven  to  eight  times  that  of  the  C.E. 
control.  Here,  we  actually  have  a  9-dimensional  problem. 

However,  judging  from  the  Improvement  over  the  C.E.  control,  the 
extra  computation  time  Is  worthwhile. 

Note  that  the  relative  time  between  the  dual  control  and  the  C.E. 
control  increases  as  we  have  a  higher  dimensional  problem.  This 
Is  due  to  the  fact  that  with  higher  dimension,  the  computation  of 
the  approximate  optimal  cost-to-go  is  relatively  more  time  consuming. 
Thus  for  applications  to  classes  of  problems  with  high  dimension, 
some  improvement  of  the  present  algorithm  is  needed. 

(2)  The  C.E.  control  Is  actually  a  very  crude  suboptlmal  method.  More 
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sophisticated  algorithms  have  been  suggested  In  the  literature1  . 
One  suggested  approach  is  to  have  weighted  C.E.  control.  This 
control  is  obtained  having  a  bank  of  Kalman  filters  tuned  at 
different  parameters  which  adequately  cover  the  parameter  set,  and 
an  "optimal"  (which  is  actually  a  C.E.)  control  generated  for  each 
Kalman  filter  in  the  bank,  and  finally  all  these  controls  are 
combined  in  a  weighted  manner.  We  must  stress  that  this  strategy 
does  not  possess  active  learning,  and  thus  we  would  expect 
behavior  similar  to  that  in  the  C.E.  control  examples,  probably 
with  some  Improvement. 
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As  seen  from  the  above  examples,  active  learning  is  the  main 

characteristic  which  will  yield  a  satisfactory  performance,  and 

therefore  one  can  predict,  with  high  confidence,  that  assuming  the 

on-line  estimation  algorithm  to  be  the  same,  such  a  suboptimal 

algorithm  will  be  inferior  to  the  dual  control  presented  above. 

Moreover,  if  a  serial  computer  is  used,  the  computation  time  for 

the  weighted  C.E.  approach  would  be  equal  to  1  times  the  C.E. 

approach  computation  time  (both  assume  using  the  same  estimation 

algorithm),  where  L  is  the  number  of  Kalman  filters.  If  there  are 

six  parameters,  and  each  is  quantized  into  only  two  levels,  we 
£ 

have  a  total  of  2  Kalman  filters,  and  thus  the  computation  time 
is  about  65  times  the  C.E.  approach  computation  time;  this  is  much 
more  time-consuming  than  the  dual  control  approach. 

The  use  of  parallel  computers  may  reduce  the  time  for  the  weighted 
C.E.  control  approach,  since  this  control  law  is  parallel  in 
structure.  On  the  other  hand,  careful  study  of  the  present  algorithm 
may  show  that  it  also  possesses  a  parallel  structure,  though  not  in  as 
obvious  a  manner. 

(3)  The  active  learning  feature  of  this  algorithm  distinguishes  it 
from  the  other  approaches  in  the  literature.  The  examples  not  only 
demonstrate  that  the  dual  control  gives  good  performance,  but  more 
importantly  it  illustrates  why  it  gives  good  performance. 

(4)  The  present  algorithm  can  be  modified  and  refined  so  that  it  can 
eventually  become  feasible  for  real-time  computation  for  a  large 
class  of  problems.  This  is  discussed  further  in  the  next  section. 

(5)  The  present  algorithm  can  be  used  as  a  base  for  evaluating  and 
comparing  the  performing  of  different  ad  hoc  suboptimal  algorithms. 
Even  though  the  algorithm  is  still  suboptimal,  because  of  its 
active  learning  characteristic,  it  is  felt  that  the  algorithm  is 
quite  close  to  yielding  the.  optimum  performance. 
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VI.  POTENTIAL  APPLICATIONS  AND  SUGGESTIONS 
FOR  FUTURE  RESEARCH 

In  this  section,  different  classes  of  problems  that  are  potential 
fields  of  application  for  the  dual  control  theory  developed  in  this  study 
are  indicated,  and  suggestions  are  made  for  areas  of  future  research  that  are 
direct  extensions  of  this  work. 

6.1  Applications 

The  dual  control  theory  is  applicable  to  general  adaptive  control  problems 
where  learning  of  the  unknown  environment  and/or  the  state  and  parameters  of 
the  system  under  control  is  important  in  obtaining  a  good  control  strategy. 

The  concept  of  active  learning,  which  is  introduced  and  developed  in  this 
study,  is  most  important  for  these  problems.  Some  of  the  problems  that  might 
benefit  from  the  application  of  the  dual  control  theory  are  listed  below. 

(1)  Automatic  Landing  System  —  The  objective  here  is  to  bring  a  plane, 
approaching  a  land  base,  tt  land  safely  as  quickly  as  possible. 

This  requires  knowledge  of  the  position,  velocity,  and  acceleration 
of  the  plane,  as  well  as  some  unknown  parameters  (perhaps  due  to 
battle  damage,  imperfect  preflight  adjustment  of  the  autopilot 
sensitivities,  component  degradation)  in  order  to  perform  a  safe 
landing. 

(2)  Interplanetary  Missions  —  Here,  learning  of  the  unknown  environmental 
parameters  is  needed  for  controlling  the  vehicle. 

(3)  Low  Altitude  Missions  —  In  the  final  stage  of  a  low  altitude  mission, 
an  aircraft  might  want  to  fly  higher  to  gain  information;  on  the  other 
hand  it  will  be  more  exposed  to  enemy  detection.  In  this  situation, 

a  tradeoff  between  gaining  information  and  safety  of  aircraft  exists 
and  must  be  regulated. 
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(4)  Homing  Interception  —  A  homing  interceptor  equipped  with  a  con¬ 
formal  array  of  an  on-board  radar  is  described  in  Fig.  6.1.  From 
the  figure,  it  can  be  seen  that  there  are  larger  measurement  errors 
for  the  head-on  line-of  sight.  After  collecting  information  about 
the  position,  velocity,  and  acceleration  of  a  target  from  the  on¬ 
board  radar,  the  homing  interceptor  is  to  guide  itself  attempting 
to  intercept  the  target.  Thus  starting  toward  a  target  with 
uncertain  position  and  velocity,  the  interceptor  must  follow  some 
trajectory  that  will  perform  active  learning  in  order  to  increase 
the  probability  of  successful  intercept. 

6.2  Future  Research 

The  results  obtained  in  the  initial  effort  toward  the  "practical" 
stochastic  control  theory  open  up  new  areas  for  future  research  where  the 
concept  of  actively  adaptive  control  should  play  a  central  role.  These  new 
areas  are  outlined  below. 

(1)  Improvement  of  Present  Solution  Procedure  —  The  method  developed 
in  this  study  is  still  not  practical  for  some  classes  of  problems 
where  the  dimensionality  of  the  state  and  control  vectors  is  large. 
Efforts  should  be  spent  in  modifying  and  improving  the  present 
method  so  that  it  becomes  tractable  for  a  much  larger  class  of 
problems.  The  approach  would  be  to  study  carefully  the  present 
method  and  use  it  as  a  reference  in  obtaining  simpler  algorithms 
which  retain  the  active  learning  feature. 

(2)  Free  End-Time  Problems  —  Only  fixed  end-time  problems  have  been 
considered  in  the  present  study.  But  in  many  practical  situations, 
e.g.,  interception  and  soft  landing,  the  final  time  is  not  pre¬ 
specified  but  is  chosen  in  some  optimum  manner.  Therefore,  after 
having  gained  understanding  on  the  fixed  end-time,  the  free  end- 
time  problem  should  be  studied.  The  concepts  and  tools  developed 
in  the  present  study  can  be  easily  extended  to  become  applicable 

to  the  free  end-time  problem. 
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FIGURE  6.1  MEASUREMENT  UNCERTAINTY  MODEL 


(3)  Control  and/or  State  Constraints  Problems  —  Throughout  this  study, 


no  constraints  on  the  control  and  the  state  were  assumed.  But, 
in  actual  applications,  this  assumption  shoild  be  relaxed. 

Extension  of  present  results  to  this  class  of  problems  is  not 
straightforward  but  the  concept  of  active  learning  will  be  helpful 
in  both  formulation  and  method  of  solution  for  this  class  of 
problems. 

(4)  Measurement  Control  Problems  —  A  large  class  of  control  problems, 
not  directly  covered  by  the  classical  theory  of  stochastic  control, 
is  the  measurement  control  problem.  This  class  of  problems  takes 
the  general  form  of  the  block  diagram  shown  in  Fig.  6.2.  The 
unique  feature  of  the  diagram  is  the  measurement  control  which 
specifies  how  and  when  measurements  are  made.  If  the  plant  and 
the  measurement  systems  are  both  linear  and  the  coat  is  quadratic, 
Meier,  et  al[^ ,  and  Kramer  showed  that  optimum  measurement 
control  affects  only  estimation  and  therefore,  solved  the  problem 
via  the  separation  principle.  Such  a  result  corresponds  to  the 
classical  stochastic  control  of  linear  systems  with  quadratic 
criteria,  where  the  optimal  control  has  only  a  control  purpose 
and  the  solution  can  be  solved  by  the  separation  theorem.  In 
the  general  nonlinear  situation,  both  the  measurement  control 
and  the  plant  control  have  the  dual  properties  of  trying  to  improve 
estimation  and  control.  For  this  reason,  the  problem  is  called  the 
dual  measurement  control. 

Examples  of  dual  measurement  control  problems  arise  in  many 
Air  Force  problem  applications.  Three  important  cases  occur  when 
there  are  constraints  on  the  total  number  of  measurements  allowed, 
when  there  are  constraints  on  the  types  of  measurements  made,  or 
when  there  are  costs  associated  with  making  measurements.  In  the 
first  situation,  which  occurs  when  there  are  only  finite  resources 
available  to  make  measurements  and  each  measurement  uses  up  a 
given  amount  of  resource,  an  optimal  scheduling  of  measurements 
in  real  time  is  sought.  The  second  eituation  is  illustrated  by  a 
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radar  with  limited  peak  and  average  power.  Within  these  constraints 
the  quality  of  position  (range)  and  velocity  (doppler)  measurements 
Ci_n  be  traded  off  by  varying  the  radar  pulse  shape.  In  the  third 
situation,  a  good  example  of  measurement  cost  is  when  the  use  of 
a  radar  will  gain  information  about  an  enemy  but  will  also  give  the 
enemy  information  about  the  radar  location.  In  this  case  the 
information  given  to  the  enemy  may  be  represented  as  a  cost  of 
making  the  measurement  that  must  be  traded  off  with  the  benefits 
of  making  those  measurements. 

For  this  class  of  dual  measurement  problems,  the  concept  of 
active  learning  is  very  important.  The  understanding  gained  in  the 
present  study  will  provide  a  fundamental  framework  for  future  study. 

(5)  Dual  Control  and  Input  Design  for  Identification — This  study 

was  concerned  with  the  controlling  of  a  system  where  learning  of 
parameters  is  only  an  indirect  objective.  Part  II  of  this  con¬ 
tract,  is  concerned  with  learning  unknown  parameters  where 
controlling  is  only  an  indirect  objective.  Therefore,  these  two 
separate  problems  are  actually  two  faces  of  the  same  problem.  It 
would  be  of  interest  to  investigate  the  interrelation  of  these  two 
problems . 
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VII.  PUBLICATIONS  UNDER  THIS  CONTRACT 


The  following  publications  are  results  from  Part  I  of  this  contract. 

(1)  "Dual  Control  of  Stochastic  Nonlinear  Systems,"  by  E.  Tse, 

L.  Meier,  and  Y.  Bar-Shalom  (1971  IEEE  Decision  and  Control 
Conference,  Miami  Beach,  Florida). 

Abstract 

In  stochastic  control  of  nonlinear  systems,  estimation 
and  control  are  dependent — the  control,  in  addition  to  its 
effect  on  the  state  of  the  system,  affects  the  estimation 
performance.  A  method  for  obtaining  a  dual  control  sequence 
is  discussed  that  leads  to  a  one-step  optimization  problem 
and  a  control  strategy  called  the  one-step  dual  control. 

An  example  problem  is  used  to  indicate  the  performance  im¬ 
provement  when  using  the  one-step  dual  control  instead  of 
the  separation  control  policy. 

(2)  "On  the  Dual  Control  of  Stochastic  Discrete-Time  Systems," 
by  E.  Tse,  A.  J.  Tether,  Y.  Bar-Shalom,  and  L.  Meier  (Fifth 
International  Hawaii  Conference  on  Systems  Science,  Honolulu, 
January  1972). 


Abstract 

The  dual  nature  of  the  control  for  stochastic  nonlinear 
systems  is  stressed  in  formulating  a  stochastic  control  problem. 
Two  methods  for  obtaining  dual  control  sequence  are  discussed. 
The  first  method  is  the  off-line  optimal  nominal  selection, 
the  second  is  called  the  one-step  optimal  dual  control.  An 
example  is  given  which  indicates  '.hat  the  one-step  optimal  dual 
control  has  great  improvement  over  the  control  strategy  obtained 
by  imposing  separation. 


(3)  "Wide-Sense  Adaptive  Dual  Control  of  Stochastic  Nonlinear 
Systems,"  by  E.  Tse,  Y.  Bar-Shalom  and  L.  Meier  (to  appear 
in  IEEE  Trans,  on  Automatic  Control). 

Abstract 

A  new  approach  is  presented  for  the  problem  of  stochastic 
control  of  nonlinear  systems.  It  is  well  known  that,  except  for 
the  Linear-Quadratic  problem,  the  optimal  stochastic  controller 
cannot  be  obtained  in  practice.  In  general  it  is  the  curse  of 
dimensionality  which  makes  the  strict  application  of  the  principle 
of  optimality  infeasible.  The  two  subproblems  of  stochastic 
control,  estimation  and  control  property,  are  except  for  the 
Linear -Quadratic  case  intercoupled.  As  pointed  out  by  Feldbaum, 
in  addition  to  its  effects  on  the  state  of  the  system,  the  control 
also  affects  the  estimation  performance.  In  this  paper,  the 
stochastic  control  problem  is  formulated  such  that  this  dual  property 
of  the  control  appears  explicitly.  The  resulting  control  sequence 
exhibits  the  closed-loop  property:  it  takes  into  account  the  past 
observations  and  also  the  future  observation  program.  Thus  in 
addition  to  being  adaptive,  this  control  also  plans  its  future  learning 
according  to  the  control  objective.  Some  preliminary  simulation  results 
illustrate  these  properties  of  the  control. 

(4)  "An  Actively  Adaptive  Control  for  Linear  Systems  with  Random 
Parameters  via  the  Dual  Control  Approach,"  by  E.  Tse  and 
Y,  Bar-Shalom  (submitted  to  1972  IEEE  Decision  and  Control 
Conference;  also  to  be  reviewed  for  IEEE  Transactions  on 
Automatic  Control) 

Abstract 

The  problem  of  controlling  a  linear  system  with  random 
parameters  is  being  considered.  An  algorithm  is  obtained  which 
seems  to  be  appropriate  in  computational  fea3iblity  for  this 
class  of  problems.  The  algorithm  possesses  active  learning 
characteristics  in  the  sense  that  it  regulates  its  adaptation 
(learning)  in  an  optimum  manner.  Simulation  studies  are  carried 
out  in  terms  of  two  third -order  examples.  The  example  problems 
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provide  additional  insight  into  the  active  learning  characteristic 
as  compared  to  the  passive  learning  possessed  by  certainty  equiva¬ 
lence  and  many  other  suboptimal  algorithms. 


The  following  publications  are  supported  partially  Ly  this  contract. 

(1)  "Parallel  Computation  of  the  Conditional  Mean  State  Estimate  for 
Nonlinear  Systems,"  by  E.  Tse  (The  Second  Symposium  on  Nonlinear 
Estimation  Theory,  San  Diego,  1971). 

Abstract 

This  paper  discusses  an  approach  for  approximating  the 
conditional  mean  state  estimate  for  nonlinear  systems.  The 
approach  is  motivated  by  realizing  that  some  recent  advances 
in  computer  organization,  in  particular  parallel  processing, 
could  be  used  to  reduce  the  computation  time  if  the  problem 
is  appropriately  formulated.  It  is  shown  how  the  estimation 
problem  can  be  formulated  properly  so  that  this  advantage  can 
be  utilized.  Specific  approximation  methods  are  described  in 
some  detail. 

(2)  "Modal  Trajectory  Estimation  and  Parallel  Computers,"  by 
R.  E.  Larson  and  E.  Tse  (The  Second  Symposium  on  Nonlinear 
Estimation  Theory,  San  Diego,  1971). 

Abstract 

For  nonlinear  estimation,  different  estimation  methods 
are  appropriate  depending  on  the  estimation  criterion  being 
used;  and  different  sufficient  information  statistics  must 
be  updated  and  stored  in  real  time.  For  modal  trajectory 
state  estimation,  i.e.,  estimation  of  the  maximum  likelihood 
trajectory  in  state  space,  the  problem  can  be  solved  using 
the  idea  of  dynamic  programming;  in  this  case  the  optimal 
return  function  serves  as  the  sufficient  statistic.  Since 
there  are  a  number  of  parallel  operations  that  occur  in  the 
evaluation  of  the  dynamic  programming  recursive  formula,  the 
the  use  of  a  parallel  computer  could  greatly  reduce  the  com¬ 
puter  time  and  memory  required  for  obtaining  the  modal  trajectory 
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estimate.  The  purpose  of  this  paper  is  to  discuss  the 
modal  trajectory  estimation  method  and  how  various  algorithms 
for  implementing  dynamic  programming  in  a  parallel  processor 
can  be  used  to  reduce  the  computational  burden. 

(3)  "The  Third  Order  Extended  Kalman  Filter,"  by  L.  Meier  (The 
Second  Symposium  on  Nonlinear  Estimation  Theory,  San  Diego, 
1971) 


Abstract 

The  Extended  Kalman  Filter  accurate  to  the  third  order 
about  a  nominal  is  derived  and  compared  to  the  extended 
Kalman  filter  accurate  to  second  order.  It  is  found  that 
to  be  accurate  to  third  order  the  covariance  equation  must 
be  solved  in  real  time;  whereas  for  second  order  accuracy 
it  may  be  solved  a  priori. 

(A)  "Parallel  Computation  of  the  Modal  Trajectory  Estimate," 
by  R,  E.  Larson  and  E.  Tse  (Fifth  International  Havrail 
Conference  on  Systems  Science,  Honolulu,  January  1972), 


Abstract 

For  modal  tr'-jectory  state  estimation,  i.e.,  estimation 
of  the  maximum  likelihood  trajectory  in  state  space,  the 
problem  can  be  solved  using  the  idea  of  dynamic  programming. 

The  purpose  of  this  paper  is  to  discuss  various  algorithms 
for  implementing  the  dynamic  programming  equation  on  a  parallel 
computer.  In  particular,  the  following  algorithms  are  exam  ..ed: 
Parallel  States  Algorithm;  Parallel  Noises  Algorithm,  and  Parallel 
States  and  Stages  Algorithm. 
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(5)  "Parallel  Processing  Algorithms  for  Modal  Trajectory  Estimation," 
by  R.  E.  Larson  and  E.  Tse  (1972  JACC  and  to  appear  in  IEEE 
Transactions  on  Automatic  Control) . 

Abstract 

For  modal  trajectory  state  estimation,  i.e.,  estimation  of 
the  maximum  likelihood  trajectory  in  state  space,  the  problem 
can  be  solved  using  the  idea  of  dynamic  programming.  Since 
there  are  a  number  of  paralle-  operations  that  occur  in  the 
evaluation  of  the  dynamic  programming  recursive  formula,  the 
use  of  a  parallel  computer  could  greatly  reduce  the  computer 
time  and  memory  required  for  obtaining  the  modal  trajectory 
estimate.  The  purpose  of  this  paper  is  to  discuss  the  modal 
trajectory  estimation  method  and  how  various  algorithms  for 
implementing  dynamic  programming  in  a  parallel  processor  can 
be  used  to  reduce  the  computational  burden.  In  particular, 
the  following  algorithms  for  implementing  dynamic  programming 
in  parallel  processors  are  examined:  Parallel  States  Algorithm, 
Parallel  Noises  Algorithm,  and  Parallel  States  and  Stages  Algorithm. 
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Appendix  A 

THE  OPTIMAL  PERTURBATION  CONTROL 

i 


Denote  the  optimal  incremental  cost-to-go  by 

AJ*(YJ,j)  *  min  E{  min  E[...  min  E(AJ  (YN,N) |yN-1) | * |Yj+1]Yj } 

6u(j)  6u(j+l)  5u(N-l)  0 

(A.l) 

The  alternating  minimizations  and  expectations  in  the  above  reflect  the 

r  b2  1 

closed-loop  property  jf  the  control.  .  The  principle  of  optimality  leads  to 

AJ*(Yj,j)  -  min  ]l'  (j)6x’  (j  I j)  +£«£'  (j  |j)  h  (j)6x(j|j) 

6u(j)(  °‘*  2  °>** 


+  f„(j  )  «tt  a  )  +  f  «U  *  ( J )  ♦  ( j )  «U  ( J )  +  i  tr  [  Lo  ^  x  x  ( J )  ^  ( J  I J  )  ] 


+  E[AJ*(Yj+1,j+l)|Yj] 


(A.  2) 


The  covariance  I  (j | j)  is  propagated,  independently  of  the  perturbation 
control,  according  to  the  extended  Kalman  filter  equation.  (See  also 
Section  3.2.) 

Take  AJ*(Y^,j)  of  the  form 

AJ*(Yj,j)  -  g0(j)+£^(j)6i(j|j)-t-i6£,(j|j)Ko(j)6i(j|j)  .  (A.3) 

Substituting  (A. 3)  into  (A. 2),  the  minimization  of  the  right-hand  side  of  (A. 2) 
is  obtained  by  letting 


~j[<^  ^(k)  +-|du*  (kHQ  uu  (k)fiu(k)  +£^(k+l)6x(k+l|k) 


+-^6x' (k+i |k)K  (k+l)<5x(k+l  |k)  ]  =  () 


— o 


(A.  A) 


where  to  second  order, 


«£o+iU)  -  4,2o>4u lJ)+i0,„a>%(3)+|jis1  tr{fo,**a:,[4(J^ 

+  «x(j  1.1)4’  alJ>l>  +  I  e,  6u’(j)  f*  ( j ) <Sx (j  | j ) 

^  *L  °  t  U  * 

+  7  j^‘s’0)  <«»«>%«>  • 

Substitute  (A. 5)  into  (A. 4)  and  retain  terms  only  up  to  second  order;  the 


(A.  5) 


optimum  5ir|(j)  is  given  by 


n  . 

6u*(j)  “-[<|>  (j)+f'  (j)K  (j-KL)f  (j)  +  I  p!(j+l)e,f  (j)] 

-o  J  tro,uu  J  -o,uJ  —o  J  — t>,u  J  i=i°  -i  o,uu  J 


-1 


•T 


+  £o,u(J,£o0+1)+*o.u0>} 


(A.  6) 


Substituting  (A. 6)  and  (A. 3)  into  (A. 2)  (keeping  only  up  to  second  order  terms) 
and  equating  terms  in  zeroth,  first  and  second  order  of  one  has, 

by  using  the  definition  of  HQ(j),  (3.10),  the  equations  for  gQ(j),  j^OJ), 


and  i^Ci) 


g0o>  -  g0a«> 


Ho  „(j)+Ytr{Ho  xx(j)Eo(j|j)  +  [Eo(j+l|j)-E0(j.l|j+l)]Kn(jtl)}  ; 


o,u 


0,XX  “O 


—o 


8"  (»)  -Ttr  I*  £  (N|N)] 


o,xx-o 


(A.  7) 
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I 


» 


6 


5 


*o(3)  -  -  ^,„0>£o«+l)fo.2,«>  +Ho,u*<J)]' 


h0,u<i>  ;  (a. 8> 

40)=^iX(j)Ko(j+i)fO£(j)-tri„t))K1ti+i)f)>5a)+HOiai(1)r 

■  l“°."i<J)+-.i(J)£oa+1)4,u<«l'1 

•  i  *,«> -*„,**  <A-9) 

The  resulting  optimum  cost  if  U^k.N-l)  is  selected  is  thus  given  by 
(note  that  <5x(k|k)  =  0) 

J*[k,Uo(k,N-l)]  -  Jo(k)+Aj*(Yk,k)  -  Jo(k)+£o(k)  .  (A.  10) 

To  stress  the  estimation  performance  reflected  in  J*[k,U  (k,N-l)],  define 
gQ(j) »j=k,k+l, • • • ,N  according  to 

s0(J)  -go0+l>  0)  +  «;i„(3)£0U+i)|0>Jia)]'1H0j„(j)  i 

g  (N)  «  0  . 


(A. 11) 


Then  by  (A. 7),  (A. 10),  (A. 11),  J*[k,U& (k,N-l) ]  can  be  expressed  alternatively 


as 


J*[M0(kfM-l)].-Jo(k)+go(k)+|tr{*o^iji)(N|N)+  ^  <Ho,xx(j)4>(^^ 

+  i^a+iljj-^a+iij+Di^cj+i))  .  (A.  12) 

In  the  one-step  dual  control  consideration,  for  jkk+1,  only  perturbation 
analysis  will  be  carried  out  along  the  vth  nominal,  thus  the  cost  of  applying 
u(k)  can  be  approximated  by 
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Idtu(k)]  «  E{^[u(k),k]+Ltx(k),k]+Jv(kfl)+gv(k+l) 

+  ^  (k+1)  [£<k+l  I  k+1)  -  x^  (k+1)  ] 

+  |[x(k+l  |  k+1)  -Xy  (k+1)  ]  %  (k+1)  [x  (k+1 1  k+1) 

-  xv(k+l)]|Yk}  .  (A. 13) 

Equation  (3.28)  can  now  be  obtained  by  noting  tnat 

E{^[u(k),k]+Jv(k+l)  +  gv(k+l)lYk}  «  <^[u(k),k] +Jv(k+l)+gv(k+l) 

E{£^(k+l)(x(k+l|k+l)-xv(k+l)ljYk}  =  (k+1)  [x (k+1 1 k)  -  Xy  (k+1)  ] 

E{ [x (k+1 1 k+1)  -  (k+1] ' (k+1) [x(k+l |k+l)  - x^k+l)]  |Yk) 

*  [x (k+1 1 k)  -  x^  (k+1)  ]  (k+1)  [x  (k+1  |k)  -x^  (k+1)  ] 

+  tr(K  (k+1)  [I(k+l]k) -j:(k+l|k+l)]}  .  (A. 14) 
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Appendix  B 


THE  QUADRATIC  FIT  OPTIMIZATION  METHOD 


The  method  to  find  the  minimizing  augument  u*  of  a  convex  function  f (u) 
with  a  quadratic  fit  is  described  below.  At  the  kth  iteration  one  has  the 
function  evaluated  at  three  points  u|k\  i“l,2,3.  The  corresponding  values 
are 

ff°  i  f(uj[k))  .  (B.l) 

Assume  these  points  are  ordered  such  that 


The  convexity  condition  is 


f  < 
x2  < 


(u2"ul)f3  +  (u3*-u2>f1 


VU1 


(B.2) 


(B.3) 


The  quadratic  fit  has  its  minimum  at 


where 


and 


1,2,3] 


F{»i,fi,i“1,2.33 


f12  a13  2  f13  a12 
f12  b13  "  fl3  b12 


f 


j 


(B.4) 


(B.5) 


(B.6) 


(B.7) 
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bij  ■ui'uj  ' 


(B.8) 


i  ir  i 

The  three  values  f ^  will  satisfy  one  and  only  one  of  the  following  sets 

of  inequalities  (this  is  a  consequence  of  (B.3)) 


f  i  >  f2  and  f  3  >  f  2 


fl>f2>f3 


f  1  <  f2  <  f  3  ‘ 


If  (B.9)  is  satisfied  then 


(k)  „  (k)  „  (k) 
u,  <  u.  <  u' 
14  3 


The  new  point  will  in  this  case  satisfy  either 
(k)  (k)  „  (k) 

ui  <  u4  <  «2 


(k)  „  (k)  (k) 

u2  '<u4  ^<u3 


and  the  value  of  the  function  at  the  new  point  is 


Vf2 


Vf2 


(B.9) 


(B.10) 


(B.ll) 


(B.12) 


(B.13) 


(B.14) 


(B.15) 


(B.16) 


The  procedure  to  choose  the  new  set  of  three  points  is  as  follows, 
depending  on  which  of  Eq.  (B.9)  through  (B.ll)  is  satisfied. 

I.  Equation  (B.9)  is  satisfied: 

If  { (E . 13)  and  (B.15)}  or  {(B.14)  and  (B.16)}  are  satisfied,  then  the 
new  set  of  three  points  is 

„k+l  A  (k+1)  ...  (k), 

U  "<ul  *1-1, 2, 3  ‘  0rd{uj  >J« 


(B.17) 


where  ord{*}  stands  for  ordering  in  t.he  sense  of  (B.2). 

If  { (B . 13)  and  (B.16)}  or  {(B.14)  and  (B.15)}  are  satisfied,  then 

Uk+1  =  ord{Uj(k)}j?4l  .  (B.18) 

II.  Equation  (B.10)  is  satisfied.  The  new  set  is  given  by  (B.18). 

III.  Equation  (B.ll)  is  satisfied.  The  new  set  is  given  by  (B.17). 

The  search  will  stop  when 

lu4-u2|  <  eCu^)  =  maxtcjuj  ,c2]  .  (B.19) 

The  algorithm  a. n  be  summarized  as  follows: 

1.  Given  the  first  three  points  (ordered),  evaluate  the  corresponding 
values  of  f. 

2.  a.  Set  k  =  0 

b.  Check  for  convexity  (B.3).  If  convex,  go  to  3. 

Otherwise 

|  I 

u4  =  u2  +  2  I  u3  “  u2  I  s8n  (J2~  “V  * 

c.  Set  k-*-k+l.  Go  to  2b. 

3.  Compute  u^  using  (B.4). 

4.  If  (B.9)  is  satisfied  and  also  (B.19),  set  u*  =  u^  and 

exit.  Otherwise  evaluate  f , . 

4 

5.  If  (B.9)  is  satisfied  use  procedure  I  and  then  go  to  2. 

Otherwise  go  to  6. 

6.  If  (B.10)  is  satisfied  use  (B.18)  and  then  go  to  2. 

Otherwise  go  to  7. 

7.  Use  (B.17)  and  than  go  to  2a. 


vgZ&Z^&S SPfK^  t5??/v'f?"5‘' 
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> 


> 


> 
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Appendix  C 

THE  APPROXIMATE  OPTIMAL  COST  TO  GO  FOR  LINEAR  SYSTEM 
WITH  RANDOM  PARAMETERS 


Define  Hq ( j )  by 

H0(j)  »  \  [^(j)  -  o(j)3>  wCjHx^j)  -  £.0)3  +|  X(j)  u2(j) 

+  2iO+1)  JLlj;  2.  (J)>  uq0)]  .  ^C,:L^ 

x  6 

Partitioning  p^  into  two  parts,  p^  and  p.,.  of  dimensions  n  and  s  respectively, 
we  have 

Vj)  “i  f2oO)  "  P(j)3*  W(j)[Xo(j)  -  p(j)]  +  y  A(j)  u2(j) 

+  4  (j+1)  [Vj)  Vj)  +^o(j)  V-i)] 

+  4’^+1)  £(j)  £0(j)  .  (C.2) 


Using  the  formulae  in  Section  I  ,  we  have  the  following  partial  derivatives 


Ho,x(:})  “  “  £0)3  +  A^(j)  £^(j+l) 

Ho,0(3)  "  E  l £j[  £^(j+l)3  <J>  *°0(j)  +  b0(^  Uo(j>] 

+  D'(j)  pjjcj+D 

Ko,u(^  =  X(J>  u0(j)  +4’(j+l)  b^J) 

Ho,xx^)  = 

HO,X0(j)  "  E  £±  ^+1)  4(j) 

Vex^  -  h;,x0(^ 


(C.3) 

(C.4) 

(C.5) 

(C.6) 

(C.7) 

(C.8) 


Equations  (4.11)  -  (4.20)  are  obtained  by  substituting  (C.3)  -  (C.ll)  into 

0 

(A. 7)  -  (A. 9).  Note  that  p  (j)  does  not  appear  in  the  computation  of  K  (j), 

3C  ^  "O 

^(j)  and  go(j),  therefore  its  equation  is  not  given  in  Section  4.3. 
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Appendix  D 

PROOF  OF  EQUATION  (4.36) 


It  can  be  easily  seen  from  the  end  conditions  in  (4.20)  and  (4.31) 
that  (4.36)  is  satisfied  for  j  «N.  Now  assuming  that 

jytj+u  -  £,a+i>  -  VJ+D^u-f-i)  j 

it  will  be  shown  that  (4.36)  holds.  From  (4.20)  and  (4.31),  one  has,  making 
use  of  (4.34)  and  (4.35) 


(0.1) 


-  £,<3+1)1  +M0)2o0) 


-  Po0)*’0)^)(J+l)b()<jH»<J)u()(j)  +b;0)!^(j+l)  -£,«+!>])  •  0>.2> 


Inserting  (D.l)  into  (D.2)  and  then  using  (4.26)  yields 


jpj)  -  -  A^(j)u  -  w0(j)  ^(j+D^a^aj^a+D^a+D 


+  W(j)Xo4j)  -  P0(j)A;(j)K0(j+l)bo(j)A(j)uo(j) 


-  t^(j)u  -  jro(j)iy  i+D^d^aj^a+D^aj+wa)}  x/j) 


il  » 


■+  Aj(j)U  - ^(j^Cj+D^a^Cj)  -  iro(J)x(j)iKo(j+i)bo(j)u0(j) 


(0.3) 


The  last  term  al.cve  is  equal  to  zero.  This  can  be  easily  seen  by  re¬ 
arranging  it  and  using  (4.29): 

A*(j)K,(j+l)b^(j)  {1  -  U0(j)Ib;(3)K,(j+l)b0(3)  +  Mj)l)  uo(J)  -  0 


(0.4) 


Now,  using  (D.4)  and  (4.30)  in  (0.3)  one  immediately  obtains  (4.36),  the 
desired  result.  Notice  that  this  result  is  independent  of  the  control  uq. 
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